Split this csv/xls into separate files based on two columns?

Question

I have a 35 MB Excel file with these columns:

Index, Name, Year, AgeGroup1, AgeGroup2, AgeGroup3 [...]
1, Sweden, 1950, 20, 25, 27
2, Norway, 1950, 22, 27, 28
2, Sweden, 1951, 24, 24, 22

I'd like to split the file into several csv files based on the "Name" column (and preferably also name the files based on the value in this column).
I'd also like the files to be sorted by "Year" (but this could of course be done in Excel beforehand.)

A bash script or Kettle/Pentaho solution would be much appreciated. (Alternatives are also welcome.)

Kent · Accepted Answer

i just used the example data you pasted there.

awk oneliner can do it for you:

 awk -F, 'NR==1{title=$0;next} { print >> ($2".csv");colse}' yourCSV

see below test:

kent$  l  
total 4.0K
-rw-r--r-- 1 kent kent 136 2011-10-05 11:04 t

kent$  cat t
Index, Name, Year, AgeGroup1, AgeGroup2, AgeGroup3
1, Sweden, 1950, 20, 25, 27
2, Norway, 1950, 22, 27, 28
2, Sweden, 1951, 24, 24, 22


kent$  awk -F, 'NR==1{title=$0;next} { print >> $2".csv"}' t

kent$  head *.csv
==>  Norway.csv <==
2, Norway, 1950, 22, 27, 28

==>  Sweden.csv <==
1, Sweden, 1950, 20, 25, 27
2, Sweden, 1951, 24, 24, 22

update

 awk -F, 'NR>1{ fname=$2".csv"; print >>(fname); close(fname);}' yourCsv

Split this csv/xls into separate files based on two columns?

Answers (2)

Related Questions