Reputation: 298
The csv structure is such a way as below.
"field1","field2","field3,with,commas","field4",
Ther are four fields in the csv file.
the first : field1
the second : field2
the third : field3,with,commas
the forth : field4
Here is my regular expression for awk.
'^"|","|",$'
debian8@hwy:~$ echo '"field1","field2","field3,with,commas","field4",' |awk -F '^"|","|",$' '{print NF}'
6
debian8@hwy:~$ echo '"field1","field2","field3,with,commas","field4",' |awk -F '^"|","|",$' '{print $1}'
debian8@hwy:~$ echo '"field1","field2","field3,with,commas","field4",' |awk -F '^"|","|",$' '{print $2}'
field1
debian8@hwy:~$ echo '"field1","field2","field3,with,commas","field4",' |awk -F '^"|","|",$' '{print $3}'
field2
debian8@hwy:~$ echo '"field1","field2","field3,with,commas","field4",' |awk -F '^"|","|",$' '{print $4}'
field3,with,commas
debian8@hwy:~$ echo '"field1","field2","field3,with,commas","field4",' |awk -F '^"|","|",$' '{print $5}'
field4
debian8@hwy:~$ echo '"field1","field2","field3,with,commas","field4",' |awk -F '^"|","|",$' '{print $6}'
Two issues remains in my regular expression '^"|","|",$'.
1.4 fiels to be parsed as 6 fields by '^"|","|",$'.
2.$1 and $6 was parsed into blank.
How to write a regular expression format to make:
echo '"field1","field2","field3,with,commas","field4",' |awk -F format '{print NF}'
4
debian8@hwy:~$ echo '"field1","field2","field3,with,commas","field4",' |awk -F format '{print $1}'
field1
debian8@hwy:~$ echo '"field1","field2","field3,with,commas","field4",' |awk -F format '{print $2}'
field2
debian8@hwy:~$ echo '"field1","field2","field3,with,commas","field4",' |awk -F foramt '{print $3}'
field3,with,commas
debian8@hwy:~$ echo '"field1","field2","field3,with,commas","field4",' |awk -F format '{print $4}'
field4
Upvotes: 1
Views: 68
Reputation: 13259
A workaround could be to set FS
to ","
and remove using gsub
the character at the start and the end of each record:
echo '"field1","field2","field3,with,commas","field4",' | awk -v FS='","' '{gsub(/^"|",$/, ""); print NF, $1, $2, $3, $4}'
4 field1 field2 field3,with,commas field4
Upvotes: 2
Reputation: 10497
I think that the FPAT
variable is probably what you want. Take a look at the documentation and examples in the Users Guide
Upvotes: 0