Reputation: 187
I need to "extract" certain columns from a CSV file. The list of columns to extract is long and their indices do not follow a regular pattern. So far I've come up with a regular expression for a comma-separated value but I find it frustrating that in the RHS side of sed's substitute command I cannot reference more than 9 saved strings. Any ideas around this?
Note that comma-separated values that contain a comma must be quoted so that the comma is not mistaken for a field delimiter. I'd appreciate a solution that can handle such values properly. Also, you can assume that no value contains a new line character.
Upvotes: 2
Views: 2635
Reputation: 203209
With GNU awk:
$ cat file
a,"b,c",d,e
$ awk -vFPAT='([^,]*)|("[^"]+")' '{print $2}' file
"b,c"
$ awk -vFPAT='([^,]*)|("[^"]+")' '{print $3}' file
d
$ cat file
a,"b,c",d,e,"f,g,h",i,j
$ awk -vFPAT='([^,]*)|("[^"]+")' -vOFS=, -vcols="1,5,7,2" 'BEGIN{n=split(cols,a,/,/)} {for (i=1;i<=n;i++) printf "%s%s", $(a[i]), (i<n?OFS:ORS)}' file
a,"f,g,h",j,"b,c"
See http://www.gnu.org/software/gawk/manual/gawk.html#Splitting-By-Content for details. I doubt if it'd handle escaped double quotes embedded in a field, e.g. a,"b""c",d
or a,"b\"c",d
.
See also What's the most robust way to efficiently parse CSV using awk? for how to parse CSVs with awk in general.
Upvotes: 3
Reputation: 1292
There is command-line csvtool
available - https://colin.maudry.com/csvtool-manual-page/
# apt-get install csvtool
Upvotes: 0
Reputation: 3451
Fully fledged CSV parsers such as Perl's Text::CSV_XS
are purpose-built to handle that kind of weirdness.
I provided sample code within my answer here: parse csv file using gawk
Upvotes: 0
Reputation: 157947
CSV is not that easy to parse like it might look in the first place. This is because there can be a plenty of different delimiters or fixed column widths to separate the data, and also the data may contain the delimiter itself (escaped).
Like I already told here I would use a programming language which supports a CVS library for that.
Use
Upvotes: 1