AndrewC
AndrewC

Reputation: 6730

Splitting string on commas when data can contain commas

I have a CSV file (which I didn't design and I can't change now nor will I ever be able to change it) that contains lines like the following:

"Surname, Firstname", yes, no, somestring, whatever, etc

As you can see here, the first , is not a comma on which I'd want to split the string. Notice that this particular comma is enclosed within the quotation marks.

Because of this, a simple string.split(',') obviously won't work, as it would give me an array of length 7 for the above string instead of 6.

Is there a way to get around this? I was thinking of using regex to split the string instead but I'm not competent enough in regex to think of a pattern that would only split on commas that are not enclosed inside quotation marks.

I can think of ugly, hacky ways to do it by reading each string char by char but this would have to be a last resort as I'm sure there's a better way to do it!

Upvotes: 5

Views: 3070

Answers (3)

Reed Copsey
Reed Copsey

Reputation: 564931

You can handle this easily by using the TextFieldParser class. Just set HasFieldsEnclosedInQuotes to true.

Upvotes: 5

Jonathan Wood
Jonathan Wood

Reputation: 67355

I know there's a lot of people here who think character-by-character comparisons should never be used and will strongly disagree with me but I'm not convinced companies like Microsoft aren't the only ones who should be doing that sort of programming.

Afterall, Split does character-by-character comparisons so why is it any less ugly when you call existing code that doesn't quite do exactly what you want?

At any rate, my approach was to write my own code. And I've posted the code online at http://www.blackbeltcoder.com/Articles/files/reading-and-writing-csv-files-in-c.

Upvotes: 1

Oded
Oded

Reputation: 499402

I would suggest using a CSV parser library - there are other cases that you wouldn't have thought of (new line as part of a quoted field).

The VisualBasic namespace has a nice library that can help - the TextFieldParser.

Upvotes: 2

Related Questions