bobby
bobby

Reputation: 2789

Using awk to filter a CSV file with quotes in it

I have a text file with comma separated values. A sample line can be something like

 "Joga","Bonito",7,"Machine1","Admin"  

The " seen are part of the text and are needed when this csv gets converted back to a java object.
I want to filter out some lines from this file based on some field in the csv. The following statement doesnt work.

 awk -F "," '($2== "Bonito") {print}' filename.csv  

I am guessing that this has something to do with the " appearing in the text.
I saw an example like:

awk -F "\"*,\"*"  

I am not sure how this works. It looks like a regex, but the use of the last * flummoxed me.

Is there a better option than the last awk statement I wrote? How does it work?

Upvotes: 2

Views: 1015

Answers (2)

James Brown
James Brown

Reputation: 37464

First a bit more through test file:

$ cat file
"Joga","Bonito",7,"Machine1","Admin"
"Joga",Bonito,7,"Machine1","Admin"

Using regex ^\"? ie. starts with or without a double quote:

$ awk -F, '$2~/^\"?Bonito\"?$/' file
"Joga","Bonito",7,"Machine1","Admin"
"Joga",Bonito,7,"Machine1","Admin"

Upvotes: 1

oliv
oliv

Reputation: 13259

Since some parameters have double quotes and other not, you can filter with a quoted parameter:

awk -F, '$2 == "\"Bonito\""' filename.csv

To filter on parameter that do not have double quote, just do:

awk -F, '$3 == 7' filename.csv

Another way is to use the double quote in the regex (the command ? that make the double quote optional):

 awk -F '"?,"?' '$2 == "Bonito"' filename.csv

But this has a drawback of also matching the following line:

"Joga",Bonito",7,"Machine1","Admin"

Upvotes: 2

Related Questions