Simon Williams
Simon Williams

Reputation: 1046

CSV Regular Expression

I have inherited some code that uses regular expressions to parse CSV formatted data. It didn't need to cope with empty string fields before now, however the requirements have changed so that empty string fields are a possibility.

I have changed the regular expression from this:

new Regex("((?<field>[^\",\\r\\n]+)|\"(?<field>([^\"]|\"\")+)\")(,|(?<rowbreak>\\r\\n|\\n|$))");

to this

new Regex("((?<field>[^\",\\r\\n]*)|\"(?<field>([^\"]|\"\")*)\")(,|(?<rowbreak>\\r\\n|\\n|$))");

(i.e. I have changed the + to *)

The problem is that I am now getting an extra empty field at the end, e.g. "ID,Name,Description" returns me four fields: "ID", "Name", "Description" and ""

Can anyone spot why?

Upvotes: 0

Views: 1733

Answers (3)

Petar Ivanov
Petar Ivanov

Reputation: 93010

The problem with your regex is that it matches the empty string. Now $ works a little like lookahead - it guarantees that the match is at the end of the string, but is not part of the match.

So when you have "ID,Name,Description", your first match is

ID,, and the rest is "Name,Description"

Then the next match is

Name, and the rest is "Description"

The next match:

Description and the rest is ""

So the final match is matching the empty string.

Upvotes: 1

xanatos
xanatos

Reputation: 111820

This one:

var rx = new Regex("((?<=^|,)(?<field>)(?=,|$)|(?<field>[^\",\\r\\n]+)|\"(?<field>([^\"]|\"\")*)\")(,|(?<rowbreak>\\r\\n|\\n|$))");

I move the handling of "blank" fields to a third "or". Now, the handling of "" already worked (and you didn't need to modify it, it was the second (?<field>) block of your code), so what you need to handle are four cases:

,
,Id
Id,
Id,,Name

And this one should do it:

(?<=^|,)(?<field>)(?=,|$)

An empty field must be preceeded by the beginning of the row ^ or by a ,, must be of length zero (there isn't anything in the (?<field>) capture) and must be followed by a , or by the end of the line $.

Upvotes: 2

Paolo Tedesco
Paolo Tedesco

Reputation: 57172

I would suggest you to use the FileHelpers library. It is easy to use, does its job and maintaining your code will be much easier.

Upvotes: 1

Related Questions