Reputation: 27238
I'm using gawk
and its FPAT = "([^,]*)|(\"[^\"]+\")"
feature, to parse CSV files, which may or may not have individual values wrapped within quotes. (I don't think any values themselves have quotes within, but they do frequently have commas.)
The resulting output will be SQL (for sqlite3), so, I need to ensure that all text values are quoted. However, if I simply quote all values from the input when printing into output, then those values that were also quoted within the original CSV file end up having double double-quotes, and things don't work.
How do I remove the double quotes with awk from the individual fields (only to unconditionally add them later on when printing the output)? Or only print the quotes conditionally in the first place?
Upvotes: 1
Views: 158
Reputation: 40778
I think you have to check each field. Like:
BEGIN { FPAT="([^,]*)|(\"[^\"]+\")"}
{
for (i=1; i<=NF;i++) {
if (substr($i,1,1)!="\"")
$i="\""$i"\""
print $i
}
}
For example input:
adf," asdfas, d",adsf
4,5," dafs"
The output is:
"adf"
" asdfas, d"
"adsf"
"4"
"5"
" dafs"
Update
To only print a set of columns:
BEGIN {
FPAT="([^,]*)|(\"[^\"]+\")"
a="1 3"
n=split(a,b," ")
}
{
for (i=1; i<=n; i++) {
r=$(b[i])
if (substr(r,1,1)!="\"")
r="\""r"\""
print r
}
}
Upvotes: 1