Splitting a CSV column wise with no unique delimiters in shell script

Question

I have a CSV with multiple rows some of which look like this

"ABC","Unfortunately, system has failed"," - Error in system"
"DEF","Check the button labelled "WARNING"","Warning in system"
"XYZ","Everything is okay","No errors"

I need to split these lines and extract the columns such as

I run a loop for each of the rows and extract the 2nd column as

awk -F , '{print $2}' $line

where $line represents each row. However, I end up getting incorrect values. For example, while trying to fetch 1st row 2nd column, using the above command gives me "Unfortunately and not "Unfortunately, system has failed"

I understand that my strings have both commas and quotes in them which makes it harder to split based on a delimiter. Is there anything else I can try?

James Brown · Accepted Answer

Using GNU awk and FPAT:

$ gawk '
BEGIN {
    FPAT="([^,]*)|(\"[^\"]+\")"
}
{
    print $2
}' file

Output:

"Unfortunately, system has failed"
"Check the button labelled "WARNING""
"Everything is okay"

It's not complete CSV parser, for example newlines inside quotes are not handled, you need to deal with them yourself (check NF and combine records). More about FPAT:

https://www.gnu.org/software/gawk/manual/html_node/Splitting-By-Content.html

If you want to get rid of those quotes:

$ gawk '
BEGIN {
    FPAT="([^,]+)|(\"[^\"]+\")"
}
{
    for(i=1;i<=NF;i++)            # loop all fields
        gsub(/^"|"$/,"",$i)       # remove quotes surrounding fields
    print $2
}' file

Output sample:

...
Check the button labelled "WARNING"
...

Splitting a CSV column wise with no unique delimiters in shell script

Answers (2)

Related Questions