Reputation: 24393
I like using Python, because of the easy-to-learn syntax, however, I recently learned it has no support for UTF-8 in the area of CSVs. As I often use CSVs, this seems a serious problem for me. Is there another scripting language that has a simple syntax that I can learn when I need to manage really large CSV UTF-8 files?
Upvotes: 0
Views: 202
Reputation: 30156
If you're working on the command and can install another command line tool I'd strongly recommend csvfix.
Once installed you can robustly query any csv file e.g.
csvfix order -f 1,3 file.csv
will extract the 1st and 3rd columns of a csv.
There is a full list of commands here
See this related question
Upvotes: 2
Reputation: 51613
I'd recommend using gawk
. E.g.:
awk -F ";" '{print $1 ";" $2}' FILE.csv
would print FILE.CSV
's first two (;
separated) column. To work properly with UTF-8, you should use it like:
LC_ALL=C awk 'BEGIN {print length("árvíztűrőtükörkúrópék")}' => 30 LC_ALL=en_US.utf8 awk 'BEGIN {print length("árvíztűrőtükörkúrópék")}' => 21
(Or you can set LC_ALL globally if you're using UTF-8 all the time, and you're on *nix, e.g. in .bashrc
, export LC_ALL=en_US.utf8
.)
awk
is an old, but really powerful and fast tool.
HTH
Upvotes: 1