NaNash
NaNash

Reputation: 493

Unable to parse CSV with empty field at the beginning in gawk

I created gawk(GNU Awk 5.0.1, API: 2.0 (GNU MPFR 4.0.2, GNU MP 6.1.2)) script1. here


input.csv:

,b,c,d,e,f
g,h,i,j,k,l

script1:

gawk -v FPAT='[^,"]*|"([^"]|"")*"' -v OFS=, '{print $1,$2,$3,$4,$5,$6}' input.csv

script1 output:
The output of script1 is all empty!

,,,,,
g,h,i,j,k,l

I created script2 to make sure the field is parsed. here

script2:

gawk -v FPAT='[^,"]*|"([^"]|"")*"' -v OFS=, '{for (i=1; i<=NF; i++) printf "field #%d: %s\n", i, $(i)}' input.csv

script2 output:
field#1 is empty, so I confirmed that it was properly parsed.

field #1: 
field #2: b
field #3: c
field #4: d
field #5: e
field #6: f
field #1: g
field #2: h
field #3: i
field #4: j
field #5: k
field #6: l

Why my script1 is unable to parse CSV with empty field at the beginning in gawk?
sincerely

Upvotes: 1

Views: 159

Answers (2)

Ed Morton
Ed Morton

Reputation: 204099

It's a bug, reported now to the GNU awk developers (see https://lists.gnu.org/archive/html/bug-gawk/2019-10/msg00041.html) and meanwhile you can work around it this way:

$ gawk -v FPAT='[^,"]*|"([^"]|"")*"' -v OFS=, '{oFPAT=FPAT; FPAT=""; FPAT=oFPAT} {print $1,$2,$3,$4,$5,$6}' input.csv
,b,c,d,e,f
g,h,i,j,k,l

Here's a simplified version of the problem (based on @karakfa's earlier work):

$ echo ',b' | gawk -v FPAT='[^,]*' '{print $2}'
b
$ echo ',b' | gawk -v FPAT='[^,]*' '{print $1, $2}'

$

It can be worked around in the same way as a previous bug (https://lists.gnu.org/archive/html/bug-gawk/2017-04/msg00000.html):

$ echo ',b' | gawk -v FPAT='[^,]*' '{oFPAT=FPAT; FPAT=""; FPAT=oFPAT; print $1, $2}'
 b
$

Apparently it can also be worked around just by accessing NF, e.g.:

$ echo ',b' | gawk -v FPAT='[^,]*' '{NF; print $1, $2}'
 b
$

Upvotes: 3

karakfa
karakfa

Reputation: 67507

Not an answer, but too long for comments.

This seems a strange behavior. A simpler test case

This works fine

$ echo ',b' | gawk -v FPAT='[^,"]*' -v OFS=_ '{print $1 "$"}'
$

This too

$ echo ',b' | gawk -v FPAT='[^,"]*' -v OFS=_ '{print $2 "$"}'
b$

But not this

$ echo ',b' | gawk -v FPAT='[^,"]*' -v OFS=_ '{print $1,$2 "$"}'
_$

Upvotes: 2

Related Questions