Reputation: 25
I have a CSV with multiple rows some of which look like this
"ABC","Unfortunately, system has failed"," - Error in system"
"DEF","Check the button labelled "WARNING"","Warning in system"
"XYZ","Everything is okay","No errors"
I need to split these lines and extract the columns such as
I run a loop for each of the rows and extract the 2nd column as
awk -F , '{print $2}' $line
where $line
represents each row. However, I end up getting incorrect values. For example, while trying to fetch 1st row 2nd column, using the above command gives me "Unfortunately
and not "Unfortunately, system has failed"
I understand that my strings have both commas and quotes in them which makes it harder to split based on a delimiter. Is there anything else I can try?
Upvotes: 1
Views: 196
Reputation: 1126
If you want to put your input data in a 3x3 table
you can do it with awk
:
awk -v FS=',[^ ]' -v OFS="|" '{print $1, $2, $3}' file
ABC"|Unfortunately, system has failed"| - Error in system"
"DEF"|Check the button labelled "WARNING""|Warning in system"
"XYZ"|Everything is okay"|No errors"
the code:
FS=',[^ ]'
? Just that comma in , system
is not a separator.Upvotes: 1
Reputation: 37404
Using GNU awk and FPAT
:
$ gawk '
BEGIN {
FPAT="([^,]*)|(\"[^\"]+\")"
}
{
print $2
}' file
Output:
"Unfortunately, system has failed"
"Check the button labelled "WARNING""
"Everything is okay"
It's not complete CSV parser, for example newlines inside quotes are not handled, you need to deal with them yourself (check NF
and combine records). More about FPAT
:
https://www.gnu.org/software/gawk/manual/html_node/Splitting-By-Content.html
If you want to get rid of those quotes:
$ gawk '
BEGIN {
FPAT="([^,]+)|(\"[^\"]+\")"
}
{
for(i=1;i<=NF;i++) # loop all fields
gsub(/^"|"$/,"",$i) # remove quotes surrounding fields
print $2
}' file
Output sample:
...
Check the button labelled "WARNING"
...
Upvotes: 1