dan sawyer
dan sawyer

Reputation: 193

sed convert double quoted csv comma delimited to non quoted pipe delimited

The input file 'input.file' is:

"col one",,,"col, two",,"col, three"
,"col one",,"col, two",,"col, three"
,,"col one","col, two",,"col, three"

The desired output file is:

col one|||col, two||col, three
|col one||col, two||col, three
||col one|col, two||col, three

This is what is in place so far:

sed -r 's/"([^"])/\1/g; s/,/|/g'  ./input.file

The objective is the first substitution

s/"([^"])/\1/g

would parse arbitrary fields delineated by " and copy them to the output and the second substitution

s/,/|/g

would replace the ',' not included in double quoted fields with '|'.

Upvotes: 1

Views: 699

Answers (1)

Sundeep
Sundeep

Reputation: 23667

$ cat ip.txt 
"col one",,,"col, two",,"col, three"
,"col one",,"col, two",,"col, three"
,,"col one","col, two",,"col, three"

$ perl -pe 's/"[^"]+"(*SKIP)(*F)|,/|/g; s/"//g' ip.txt
col one|||col, two||col, three
|col one||col, two||col, three
||col one|col, two||col, three
  • "[^"]+"(*SKIP)(*F) skip the pattern "[^"]+" and look for any other alternate match provided
    • (*F) is short-form for (*FAIL) and (?!) can also be used instead
  • |, alternate pattern to match ,
  • |/g replace all , with |
  • s/"//g then delete all "


Further Reading:

Upvotes: 1

Related Questions