Reputation: 493
I created gawk(GNU Awk 5.0.1, API: 2.0 (GNU MPFR 4.0.2, GNU MP 6.1.2)) script1. here
input.csv:
,b,c,d,e,f
g,h,i,j,k,l
script1:
gawk -v FPAT='[^,"]*|"([^"]|"")*"' -v OFS=, '{print $1,$2,$3,$4,$5,$6}' input.csv
script1 output:
The output of script1 is all empty!
,,,,,
g,h,i,j,k,l
I created script2 to make sure the field is parsed. here
script2:
gawk -v FPAT='[^,"]*|"([^"]|"")*"' -v OFS=, '{for (i=1; i<=NF; i++) printf "field #%d: %s\n", i, $(i)}' input.csv
script2 output:
field#1 is empty, so I confirmed that it was properly parsed.
field #1:
field #2: b
field #3: c
field #4: d
field #5: e
field #6: f
field #1: g
field #2: h
field #3: i
field #4: j
field #5: k
field #6: l
Why my script1 is unable to parse CSV with empty field at the beginning in gawk?
sincerely
Upvotes: 1
Views: 159
Reputation: 204099
It's a bug, reported now to the GNU awk developers (see https://lists.gnu.org/archive/html/bug-gawk/2019-10/msg00041.html) and meanwhile you can work around it this way:
$ gawk -v FPAT='[^,"]*|"([^"]|"")*"' -v OFS=, '{oFPAT=FPAT; FPAT=""; FPAT=oFPAT} {print $1,$2,$3,$4,$5,$6}' input.csv
,b,c,d,e,f
g,h,i,j,k,l
Here's a simplified version of the problem (based on @karakfa's earlier work):
$ echo ',b' | gawk -v FPAT='[^,]*' '{print $2}'
b
$ echo ',b' | gawk -v FPAT='[^,]*' '{print $1, $2}'
$
It can be worked around in the same way as a previous bug (https://lists.gnu.org/archive/html/bug-gawk/2017-04/msg00000.html):
$ echo ',b' | gawk -v FPAT='[^,]*' '{oFPAT=FPAT; FPAT=""; FPAT=oFPAT; print $1, $2}'
b
$
Apparently it can also be worked around just by accessing NF, e.g.:
$ echo ',b' | gawk -v FPAT='[^,]*' '{NF; print $1, $2}'
b
$
Upvotes: 3
Reputation: 67507
Not an answer, but too long for comments.
This seems a strange behavior. A simpler test case
This works fine
$ echo ',b' | gawk -v FPAT='[^,"]*' -v OFS=_ '{print $1 "$"}'
$
This too
$ echo ',b' | gawk -v FPAT='[^,"]*' -v OFS=_ '{print $2 "$"}'
b$
But not this
$ echo ',b' | gawk -v FPAT='[^,"]*' -v OFS=_ '{print $1,$2 "$"}'
_$
Upvotes: 2