Luteser
Luteser

Reputation: 57

Split CSV and add headers and index column with awk

I am trying to split a large CSV into smaller files based on date using awk. I have the essential command working although it returns a "too many open files" error. So I've read to close the file, yet the command as written closes it after only one row is written to each file.

awk -F' ' '{close($1".csv")}{print > ($1".csv")}' 2015full.csv 

Additionally I would like to add a header row to each split file and an index column. My data looks like:

2015full.csv

2015-12-24 18:20:57 -87.2788204 36.5984675 0
2015-12-24 18:20:42 -87.2784049 36.597298699999996 0
2015-12-24 18:20:26 -87.274402 36.5932405 0
2015-12-23 18:20:10 -87.25762519999999 36.572330400000006 0
2015-12-23 18:19:40 -87.25762519999999 36.572330400000006 0
2015-12-23 18:19:21 -87.25762519999999 36.572330400000006 0

And I'm trying to get:

2015-12-24.csv

num date time lon lat
1 2015-12-24 18:20:57 -87.2788204 36.5984675
2 2015-12-24 18:20:42 -87.2784049 36.597298699999996
3 2015-12-24 18:20:26 -87.274402 36.5932405

2015-12-23.csv

num date time lon lat
1 2015-12-23 18:20:10 -87.25762519999999 36.572330400000006
2 2015-12-23 18:19:40 -87.25762519999999 36.572330400000006
3 2015-12-23 18:19:21 -87.25762519999999 36.572330400000006

I may have the right pieces below:

awk -F' ' 'NR==1{print “num”, $0; “date”, $1; “time”, $2; “lon”, $3; “lat”, $4; next}{print (NR-1), $0}{close($1".csv")}{print > ($1".csv")}' 2015full.csv

but they are not in an order that creates a working command for my purposes. Does anyone have a suggestion for me? Thanks!

Upvotes: 4

Views: 636

Answers (3)

once
once

Reputation: 1399

if not using awk

for i in $(cut -d ' ' -f1 2015full.csv|uniq);do grep -w $i 2015full.csv|nl -w1 -s ' ' |sed "1i num date time lon lat" >$i.csv; done

Upvotes: 0

Ed Morton
Ed Morton

Reputation: 203254

awk '
    BEGIN { hdr = "num" OFS "date" OFS "time" OFS "lon" OFS "lat" }
    $1!=prev { close(out); out=$1".csv"; print hdr > out; idx=0; prev=$1 }
    { print ++idx, $0 > out }
' 2015full.csv

Upvotes: 0

RavinderSingh13
RavinderSingh13

Reputation: 133458

Though you are closing the files in backend but when $1's value will change at that time previous $1's valued .csv file will be still opened in backend so rather than closing them on each line, try following and let me know if this helps you(considering that your Input_file is sorted as per shown sample, if not then we have to sort it by 1st column and pipe it to awk command).

awk -F' ' 'prev!=$1{close(prev".csv")}{print > ($1".csv");prev=$1}' 2015full.csv 

EDIT: I could see you need a header in each $1 output file, if so then following code may help you in same too.

awk -F' ' 'prev!=$1{close(prev".csv");print "num date time lon lat" > ($1".csv")}{print > ($1".csv");prev=$1}' 2015full.csv

Upvotes: 3

Related Questions