Rajat Saini
Rajat Saini

Reputation: 557

Parsing CSV files with multiple formats in C# using regex

I have been trying to pass a csv file with three fields. The first two fields are simple and are easily extracted, the problem is with third field which is a string in nature hence can contain special characters including the ',' it self which is used to delimit the fields. I tried containing the string field between two ' " '(double quotes). But my requirement is that for simple string(without special characters) can exist without double quotes. I need to handle the next line in the string also. Below is a sample of a csv file.

123,true,This is a memo

234,false,"This is also a memo"

345,true,

456,true,Above me is a blank memo

567,false,"This has a ,

in it"

678,true,This has a , in it <--- This record should be rejected

789,false,""

890,true,Above me is also a valid blank memo

I also found a good tool for testing the regex format string at http://derekslager.com/blog/posts/2007/09/a-better-dotnet-regular-expression-tester.ashx

Till now I have used the following format string ^(""(?:[^""]|"""")""|[^,]),(""(?:[^""]|"""")""|[^,])$

The problem with this format string is that it does not handle multiple lines and does not reject a string with a starting double quote but missing ending double quote.

Thanks in advance.


Thanks for the help guys but I needed to parse custom data in CSV and had to create my own custom parser. I am parsing each and every field separately and using regex string in small chunks.

Upvotes: 0

Views: 964

Answers (1)

alexn
alexn

Reputation: 58952

There is no need to invent this wheel again. I recommend using an existing CSV-parser, but there are many good alternatives.

I have had great success with CSVReader, it's very fast and easy to use. Basic usage:

using (CsvReader csv = new CsvReader(new StreamReader("data.csv"), true))
{
    int fieldCount = csv.FieldCount;
    string[] headers = csv.GetFieldHeaders();

    while (csv.ReadNextRecord())
    {
        for (int i = 0; i < fieldCount; i++)
            Console.Write(string.Format("{0} = {1};", headers[i], csv[i]));

        Console.WriteLine();
    }
}

Upvotes: 4

Related Questions