CluelessProgrammer
CluelessProgrammer

Reputation: 25

Splitting a CSV column wise with no unique delimiters in shell script

I have a CSV with multiple rows some of which look like this

"ABC","Unfortunately, system has failed"," - Error in system"
"DEF","Check the button labelled "WARNING"","Warning in system"
"XYZ","Everything is okay","No errors"

I need to split these lines and extract the columns such as

enter image description here

I run a loop for each of the rows and extract the 2nd column as

awk -F , '{print $2}' $line 

where $line represents each row. However, I end up getting incorrect values. For example, while trying to fetch 1st row 2nd column, using the above command gives me "Unfortunately and not "Unfortunately, system has failed"

I understand that my strings have both commas and quotes in them which makes it harder to split based on a delimiter. Is there anything else I can try?

Upvotes: 1

Views: 196

Answers (2)

Carlos Pascual
Carlos Pascual

Reputation: 1126

If you want to put your input data in a 3x3 table you can do it with awk:

awk -v FS=',[^ ]' -v OFS="|" '{print $1, $2, $3}' file
ABC"|Unfortunately, system has failed"| - Error in system"
"DEF"|Check the button labelled "WARNING""|Warning in system"
"XYZ"|Everything is okay"|No errors"

the code:

  • why setting FS=',[^ ]'? Just that comma in , system is not a separator.

Upvotes: 1

James Brown
James Brown

Reputation: 37404

Using GNU awk and FPAT:

$ gawk '
BEGIN {
    FPAT="([^,]*)|(\"[^\"]+\")"
}
{
    print $2
}' file

Output:

"Unfortunately, system has failed"
"Check the button labelled "WARNING""
"Everything is okay"

It's not complete CSV parser, for example newlines inside quotes are not handled, you need to deal with them yourself (check NF and combine records). More about FPAT:

https://www.gnu.org/software/gawk/manual/html_node/Splitting-By-Content.html

If you want to get rid of those quotes:

$ gawk '
BEGIN {
    FPAT="([^,]+)|(\"[^\"]+\")"
}
{
    for(i=1;i<=NF;i++)            # loop all fields
        gsub(/^"|"$/,"",$i)       # remove quotes surrounding fields
    print $2
}' file

Output sample:

...
Check the button labelled "WARNING"
...

Upvotes: 1

Related Questions