Reputation: 12384
I have been using happily gawk with FPAT. Here's the script I use for my examples:
#!/usr/bin/gawk -f
BEGIN {
FPAT="([^,]*)|(\"[^\"]+\")"
}
{
for (i=1; i<=NF; i++) {
printf "Record #%s, field #%s: %s\n", NR, i, $i
}
}
Works well.
$ echo 'a,b,c,d' | ./test.awk
Record #1, field #1: a
Record #1, field #2: b
Record #1, field #3: c
Record #1, field #4: d
Works well.
$ echo '"a","b",c,d' | ./test.awk
Record #1, field #1: "a"
Record #1, field #2: "b"
Record #1, field #3: c
Record #1, field #4: d
Works well.
$ echo '"a","b",,d' | ./test.awk
Record #1, field #1: "a"
Record #1, field #2: "b"
Record #1, field #3:
Record #1, field #4: d
Works well.
$ echo '"""a"": aaa","b",,d' | ./test.awk
Record #1, field #1: """a"": aaa"
Record #1, field #2: "b"
Record #1, field #3:
Record #1, field #4: d
Fails.
$ echo '"""a"": aaa,","b",,d' | ./test.awk
Record #1, field #1: """a"": aaa
Record #1, field #2: ","
Record #1, field #3: b"
Record #1, field #4:
Record #1, field #5: d
Expected output:
$ echo '"""a"": aaa,","b",,d' | ./test_that_would_be_working.awk
Record #1, field #1: """a"": aaa,"
Record #1, field #2: "b"
Record #1, field #4:
Record #1, field #5: d
Is there a regex for FPAT that would make this work, or is this just not supported by awk?
The pattern would be "
followed by anything but a single "
. The regex class search works one character at a time so it can't not match a ""
.
I think there may be an option with lookaround, but I'm not good enough with it to make it work.
Upvotes: 8
Views: 1271
Reputation: 2892
Because awk's FPAT doesn't know lookarounds, you need to be explicit in your patterns. This one will do:
FPAT="[^,\"]*|\"([^\"]|\"\")*\""
Explanation:
[^,\"]* # match 0 or more times any character except , and "
| # OR
\" # match '"'
([^\"] # followed by 0 or more anything but '"'
| # OR
\"\" # '""'
)*
\" # ending with '"'
Now testing it:
$ cat tst.awk
BEGIN {
FPAT="[^,\"]*|\"([^\"]|\"\")*\""
}
{
for (i=1; i<=NF; i++){ printf "Record #%s, field #%s: %s\n", NR, i, $i }
}
$ echo '"""a"": aaa,","b",,d' | awk -f tst.awk
Record #1, field #1: """a"": aaa,"
Record #1, field #2: "b"
Record #1, field #3:
Record #1, field #4: d
Upvotes: 4