CSV parser

Thursday, June 23, 2011

#.NET #C# #CSV

I’m currently working on a project where I needed a C# console application that was able to read through a Excel CSV (Comma Separated Values) file.

Basically the CSV file format is just a txt file with rows and each column is then separated by a comma (surprise!) or a semicolon. Besides a comma the data in each column can optionally be ”framed” by quotation marks.

Therefore i started out with the following code, just as I would read through a normal txt file:

try
{
    using (StreamReader readFile = new StreamReader(path))
    {
        // Do something here...
    }
}
catch (Exception e)
{
    // Do some error handling here...
}

This is, as you can see, really straight forward. First of all I declare an object of a StreamReader in a using statement. Using the object ”readFile” I am able then to navigate the file. The using statement is important as this will do the cleanup for me, by calling StreamReader.Dispose(), when the statement finishes. I always wrap this kind of code in a try...catch because when you work with files, errors just occasionally happen.

Now, to read the data from the CSV file I add the following lines of code inside the using statement:

List<string[]> parsedData = new List<string[]>();

string line;
string[] row;

while ((line = readFile.ReadLine()) != null)
{
    row = line.Split(',');
    parsedData.Add(row);
}

It just declares a new List that can hold an array of strings and the line and row variables is needed when traversing through the file. I then use the readFile object to call the ReadLine() method of the StreamReader class in a while loop. When there is no more lines in the file the line variable will be null. Inside the while loop I use the string.Split() method to split the line into an array of strings (my columns) and I then add this array to my List object (parsedData).

The problem then was that I didn’t know exactly what encoding the file would be in. What to do then? I settled on a solution where I tell the StreamReader what encoding the file probably has and it will then open it in that encoding. This can be done by adding a parameter when calling the constructor on the StreamReader class like this: using (StreamReader readFile = new StreamReader(path, encoding))

Finally all this can be wrapped in a nice method. I also added a check to be sure that the file I want to parse is actually available. But there you go:

public static List<string[]> ParseCSV(string path, Encoding encoding, char splitter)
{
    if (!File.Exists(path))
        return null;

    List<string[]> parsedData = new List<string[]>();

    try
    {
        using (StreamReader readFile = new StreamReader(path, encoding))
        {
            string line;
            string[] row;

            while ((line = readFile.ReadLine()) != null)
            {
                row = line.Split(splitter);
                parsedData.Add(row);
            }
        }
    }
    catch (Exception e)
    {
        // Do some error handling here...
    }

    return parsedData;
}