Reputation: 9
I have a big CSV field, and I use awk with the field separator set to a comma. However, some fields are quoted and contain a comma, and I'm facing this issue:
Original file:
Downloads $ cat testfile.csv
"aaa","bbb","ccc","dddd"
"aaa","bbb","ccc","d,dd,d"
"aaa","bbb","ccc","dd,d,d"
I am trying this way:
Downloads $ cat testfile.csv | awk -F "," '{ print $2","$3","$4 }'
"bbb","ccc","dddd"
"bbb","ccc","d
"bbb","ccc","dd
Expecting result:
"bbb","ccc","dddd"
"bbb","ccc","d,dd,d"
"bbb","ccc","dd,d,d"
Upvotes: 2
Views: 1775
Reputation: 803
GNU AWK Manual, Splitting by content
Simple test-case:
ID,Fruit,Comment or Description,Cost
1,"Apple",,10.00
2,"Banana",,20.00
3,"Ananas",,40.00
4,"Orange",,8.00
5,"Fruit,with,comma",,100.00
$ awk -v FPAT="([^,]*)|(\"[^\"]+\")" -v OFS=, '{print $2, $4}' f.csv Fruit,Cost
"Apple",10.00
"Banana",20.00
"Ananas",40.00
"Orange",8.00
"Fruit,with,comma",100.00
Upvotes: 0
Reputation: 52112
I would use a tool that is able to properly parse CSV, such as xsv. With it, the command would look like
$ xsv select 2-4 testfile.csv
bbb,ccc,dddd
bbb,ccc,"d,dd,d"
bbb,ccc,"dd,d,d"
or, if you really want every value quoted, with a second step:
$ xsv select 2-4 testfile.csv | xsv fmt --quote-always
"bbb","ccc","dddd"
"bbb","ccc","d,dd,d"
"bbb","ccc","dd,d,d"
Upvotes: 2
Reputation: 22012
If gawk
or GNU awk
is available, you can make use of FPAT
, which matches the fields, instead of splitting on field separators.
awk -v FPAT='([^,]+)|(\"[^\"]+\")' -v OFS=, '{print $2, $3, $4}' testfile.csv
Result:
"bbb","ccc","dddd"
"bbb","ccc","d,dd,d"
"bbb","ccc","dd,d,d"
The string ([^,]+)|(\"[^\"]+\")
is a regex pattern which matches either of:
([^,]+)
... matches a sequence of any characters other than a comma.(\"[^\"]+\")
... matches a string enclosed by double quotes (which may include commas in between).The parentheses around the patterns are put for visual clarity purpose and the regex will work without them such as FPAT='[^,]+|\"[^\"]+\"'
because the alternative |
has lower precedence.
Upvotes: 0
Reputation: 2687
Include (escaped) quotes in your field separator flag, and add them to your output print fields:
testfile.csv | awk -F "\",\"" '{print "\""$2"\",\""$3"\",\""$4}'
output:
"bbb","ccc","dddd"
"bbb","ccc","d,dd,d"
"bbb","ccc","dd,d,d"
Upvotes: 1