Ernie S
Ernie S

Reputation: 14270

Single RegEx expressiong to decode CSV with embedded dobule quotes and Commas

I have lots of CSV data that I am trying to decode using regex. I am actually tried to build on an existing code base that other people/projects hit and dont want to risk breaking their data flows by refactoring the class too much. So, I was wondering if it is possible to decode this text with a single regex (which is how the class works currently):

f1,f2,f3,f4,f5,f6,f7
,"clean text","with,embedded,commas.","with""embedded""double""quotes",,"6.1",

First row is the header. If I save this as xxx.csv and open in Excel, it properly decompiles it to read (note the space between the fields are the cell breaks):

f1  f2  f3  f4  f5  f6  f7
clean text  with,embedded,commas.   with"embedded"double"quotes     6.1     

But when I try this in .net, I get stuck on the regex. I have this:

string regExp = "(((?<x>(?=[,\\r\\n]+))|\"(?<x>([^\"]|\"\")+)\"|(?<x>[^,\\r\\n]+)),?)";

You can see it in action here:

http://ideone.com/hRq8xe

Which results in this:

<start>

clean text
with,embedded,commas.
with""embedded""double""quotes

6.1
<end>

This is very close but it does not replace the escaped double-double quotes with a single-double quote like Excel does. I could not come up with a regex that worked better. Can it be done?

Upvotes: 1

Views: 193

Answers (1)

Eder
Eder

Reputation: 1884

Maybe you can somehow manage to match your string using regular-expression-conditionals with the following constructors:

  • if-then sentence(?(?=regex)then|else)
  • multiple if-then sentences(?(?=condition)(then1|then2|then3)|(else1|else2|else3))

I came up with the following pattern in order to match the body of your text: ([^\,]+(?(?=[^\,])([^\"]+")|([^\,]+,))), however, you will need to put an extra effort in order to create a completly matching expression for your text or end up using a file parser. If so, You can take a look at FileHelpers, a pretty neat library for parsing text files.

Sources:

Upvotes: 1

Related Questions