Kristofer Källsbo
Kristofer Källsbo

Reputation: 1087

Regular expression for parsing CSV

I'm trying to parse a CSV file in C#. Split on commas (,). I got it to work with this:

[\t,](?=(?:[^\"]|\"[^\"]*\")*$)

Splitting this string:

2012-01-06,"Some text with, comma",,"300,00","143,52"

Gives me:

2012-01-06
"Some text with, comma"

"300,00"
"143,52"

But I can't figure out how to lose the "" from the output so I get this instead:

2012-01-06
Some text with, comma

300,00
143,52

Any suggestions?

Upvotes: 1

Views: 7226

Answers (3)

iamkrillin
iamkrillin

Reputation: 6876

If you are trying to parse a CSV and using .NET, don't use regular expressions. Use a component that was created for this purpose. See the question CSV File Imports in .Net.

I know the CSV specification looks simple enough, but trust me, you are in for heartache and destruction if you continue down this path.

Upvotes: 2

aquinas
aquinas

Reputation: 23796

So, something like this. Again, I wouldn't use RegEx for this purpose, but YMMV.

var sp = Regex.Split(a, "[\t,](?=(?:[^\"]|\"[^\"]*\")*$)")
     .Select(s => Regex.Replace(s.Replace("\"\"","\""),"^\"|\"$","")).ToArray();

So, the idea here is that first of all, you want to replace double double quotes with a single double quote. And then that string is fed to the second regex which simply removes double quotes at the beginning and end of the string.

The reason for the first replace is because of strings like this:

var a = "1999,Chevy,\"Venture \"\"Extended Edition, Very Large\"\" Dude\",\"\",\"5000.00\"";

So, this would give you a string like this: ""Extended Edition"", and the double quotes need to be changed to single quotes.

Upvotes: 2

Chris Dargis
Chris Dargis

Reputation: 6043

Why are you using regular expressions for this? Ensuring the file is well-formed?

You can use String.Replace()

String s = "Some text with, comma";
s = s.Replace("\"", "");

// After matched
String line = 2012-01-06,"Some text with, comma",,"300,00","143,52";
String []fields = line.Split(',');
for (int i = 0; i < fields.Length; i++)
{
   // Call a function to remove quotes
   fields[i] = removeQuotes(fields[i]);
}

String removeQuotes(String s)
{
   return s.Replace("\"", "");
}

Upvotes: 2

Related Questions