Marco Falzone
Marco Falzone

Reputation: 139

Split CSV in multiple files, reading column 1 for naming output files

I have an employees.csv file with around 500 rows and 11 columns, column fileds are limited by double quotes:

"1","Paula","Paula's Role","Paula's Job Description","Paula's Department","11/10/2008","8","14","10","24","0
"2","John","John's Role","John's Job Description","John's Department","11/10/2008","2","17","6","11","0"
"3","Mark","Mark's Role","Mark's Job Description","Mark's Department","11/10/2008","4","17","13","44","0"
:
:
(more records)
:
:
"499","Maria","Maria's Role","Maria's Job Description","Maria's Department","11/10/2008","8","15","2","9","0"
"500","Peter","Peter's Role","Peters's Job Description","Peters's Department","11/10/2008","8","17","16","22","0"

I'm trying to figure out how to split such file in multiple csv (one line = one file), according to the first field (unique employee ID number). The output of the command shall be 500 separate csv files, containing 1 row each one, and named as follows:

1.csv
2.csv
3.csv
:
:
:
499.csv
500.csv

I've been trying with combination of cat and awk, but there is some mistake in the code:

for i in $(cat unix | awk -F\, '{print $1}' /myfolder/employees.csv);

    do
        grep $i "/myfolder/employees.csv" > "/myfolder/splittedfiles/$i";
    done

Many thanks.

Upvotes: 2

Views: 808

Answers (2)

Lars Fischer
Lars Fischer

Reputation: 10149

You could use GNU awk like this:

awk 'BEGIN {FPAT="[^\"]+"} { print $0 > "/myfolder/splittedfiles/"$1".csv" }' yourfile 

FPAT defines field content by a regular expression, here it helps us to strip the quotes from $1.

Upvotes: 2

cxw
cxw

Reputation: 17041

Edit (and this one is tested), this gawk script does the job for me:

gawk -F'"' -- '{print $0 >> ("/myfolder/splittedfiles/" $2 ".csv")}' /myfolder/employees.csv

The -F'"' splits fields at ", so the employee number is in $2. Then ("/myfolder/splittedfiles/" $2 ".csv") builds the filename you want, and print $0 >> ... prints the original line to that file.


Or if the fields are always in numerical order starting from 1, this should work (not tested)

split -l 1 /myfolder/employees.csv /myfolder/splittedfiles/EMPL
empno=1
for fname in /myfolder/splittedfiles/EMPL* ; do
    mv "$f" "/myfolder/splittedfiles/${empno}.csv"
    empno=$((empno+1))
done

split makes each line (-l 1) a separate file. The for loops over those files in order. The mv renames each file to be ${empno}.csv, starting from empno=1. Then $((empno+1)) increments empno.

Upvotes: 0

Related Questions