Village
Village

Reputation: 24393

Simplest scripting language for working with CSVs

I like using Python, because of the easy-to-learn syntax, however, I recently learned it has no support for UTF-8 in the area of CSVs. As I often use CSVs, this seems a serious problem for me. Is there another scripting language that has a simple syntax that I can learn when I need to manage really large CSV UTF-8 files?

Upvotes: 0

Views: 202

Answers (2)

Joel
Joel

Reputation: 30156

If you're working on the command and can install another command line tool I'd strongly recommend csvfix.

Once installed you can robustly query any csv file e.g.

csvfix order -f 1,3 file.csv

will extract the 1st and 3rd columns of a csv.

There is a full list of commands here

See this related question

Upvotes: 2

Zsolt Botykai
Zsolt Botykai

Reputation: 51613

I'd recommend using gawk. E.g.:

awk -F ";" '{print $1 ";" $2}' FILE.csv

would print FILE.CSV's first two (; separated) column. To work properly with UTF-8, you should use it like:

LC_ALL=C awk 'BEGIN {print length("árvíztűrőtükörkúrópék")}' => 30 LC_ALL=en_US.utf8 awk 'BEGIN {print length("árvíztűrőtükörkúrópék")}' => 21

(Or you can set LC_ALL globally if you're using UTF-8 all the time, and you're on *nix, e.g. in .bashrc, export LC_ALL=en_US.utf8.)

awk is an old, but really powerful and fast tool.

HTH

Upvotes: 1

Related Questions