cnst
cnst

Reputation: 27238

awk: print quotes when input CSV may or may not have them

I'm using gawk and its FPAT = "([^,]*)|(\"[^\"]+\")" feature, to parse CSV files, which may or may not have individual values wrapped within quotes. (I don't think any values themselves have quotes within, but they do frequently have commas.)

The resulting output will be SQL (for sqlite3), so, I need to ensure that all text values are quoted. However, if I simply quote all values from the input when printing into output, then those values that were also quoted within the original CSV file end up having double double-quotes, and things don't work.

How do I remove the double quotes with awk from the individual fields (only to unconditionally add them later on when printing the output)? Or only print the quotes conditionally in the first place?

Upvotes: 1

Views: 158

Answers (1)

Håkon Hægland
Håkon Hægland

Reputation: 40778

I think you have to check each field. Like:

BEGIN { FPAT="([^,]*)|(\"[^\"]+\")"}
{
    for (i=1; i<=NF;i++) {
        if (substr($i,1,1)!="\"")
            $i="\""$i"\""
        print $i
    }

}

For example input:

adf," asdfas, d",adsf
4,5," dafs"

The output is:

"adf"
" asdfas, d"
"adsf"
"4"
"5"
" dafs"

Update

To only print a set of columns:

BEGIN { 
    FPAT="([^,]*)|(\"[^\"]+\")"
    a="1 3"
    n=split(a,b," ")
}
{
    for (i=1; i<=n; i++) {
        r=$(b[i])
        if (substr(r,1,1)!="\"")
            r="\""r"\""
        print r
    }
}

Upvotes: 1

Related Questions