showkey
showkey

Reputation: 298

How to parse the csv file without two blank fields in awk?

The csv structure is such a way as below.

"field1","field2","field3,with,commas","field4",          

Ther are four fields in the csv file.
the first : field1
the second : field2
the third : field3,with,commas
the forth : field4

Here is my regular expression for awk.

 '^"|","|",$' 

debian8@hwy:~$ echo '"field1","field2","field3,with,commas","field4",' |awk -F '^"|","|",$' '{print NF}'
6
debian8@hwy:~$ echo '"field1","field2","field3,with,commas","field4",' |awk -F '^"|","|",$' '{print $1}'

debian8@hwy:~$ echo '"field1","field2","field3,with,commas","field4",' |awk -F '^"|","|",$' '{print $2}'
field1
debian8@hwy:~$ echo '"field1","field2","field3,with,commas","field4",' |awk -F '^"|","|",$' '{print $3}'
field2
debian8@hwy:~$ echo '"field1","field2","field3,with,commas","field4",' |awk -F '^"|","|",$' '{print $4}'
field3,with,commas
debian8@hwy:~$ echo '"field1","field2","field3,with,commas","field4",' |awk -F '^"|","|",$' '{print $5}'
field4
debian8@hwy:~$ echo '"field1","field2","field3,with,commas","field4",' |awk -F '^"|","|",$' '{print $6}'

Two issues remains in my regular expression '^"|","|",$'.

1.4 fiels to be parsed as 6 fields by '^"|","|",$'.
2.$1 and $6 was parsed into blank.

How to write a regular expression format to make:

echo '"field1","field2","field3,with,commas","field4",' |awk -F format '{print NF}'
4
debian8@hwy:~$ echo '"field1","field2","field3,with,commas","field4",' |awk -F  format '{print $1}'
field1
debian8@hwy:~$ echo '"field1","field2","field3,with,commas","field4",' |awk -F format '{print $2}'
field2
debian8@hwy:~$ echo '"field1","field2","field3,with,commas","field4",' |awk -F foramt '{print $3}'
field3,with,commas
debian8@hwy:~$ echo '"field1","field2","field3,with,commas","field4",' |awk -F format '{print $4}'
field4

Upvotes: 1

Views: 68

Answers (2)

oliv
oliv

Reputation: 13259

A workaround could be to set FS to "," and remove using gsub the character at the start and the end of each record:

echo '"field1","field2","field3,with,commas","field4",' | awk -v FS='","' '{gsub(/^"|",$/, ""); print NF, $1, $2, $3, $4}'
4 field1 field2 field3,with,commas field4

Upvotes: 2

Component 10
Component 10

Reputation: 10497

I think that the FPAT variable is probably what you want. Take a look at the documentation and examples in the Users Guide

Upvotes: 0

Related Questions