Reputation: 57
I am trying to split a large CSV into smaller files based on date using awk. I have the essential command working although it returns a "too many open files" error. So I've read to close the file, yet the command as written closes it after only one row is written to each file.
awk -F' ' '{close($1".csv")}{print > ($1".csv")}' 2015full.csv
Additionally I would like to add a header row to each split file and an index column. My data looks like:
2015full.csv
2015-12-24 18:20:57 -87.2788204 36.5984675 0
2015-12-24 18:20:42 -87.2784049 36.597298699999996 0
2015-12-24 18:20:26 -87.274402 36.5932405 0
2015-12-23 18:20:10 -87.25762519999999 36.572330400000006 0
2015-12-23 18:19:40 -87.25762519999999 36.572330400000006 0
2015-12-23 18:19:21 -87.25762519999999 36.572330400000006 0
And I'm trying to get:
2015-12-24.csv
num date time lon lat
1 2015-12-24 18:20:57 -87.2788204 36.5984675
2 2015-12-24 18:20:42 -87.2784049 36.597298699999996
3 2015-12-24 18:20:26 -87.274402 36.5932405
2015-12-23.csv
num date time lon lat
1 2015-12-23 18:20:10 -87.25762519999999 36.572330400000006
2 2015-12-23 18:19:40 -87.25762519999999 36.572330400000006
3 2015-12-23 18:19:21 -87.25762519999999 36.572330400000006
I may have the right pieces below:
awk -F' ' 'NR==1{print “num”, $0; “date”, $1; “time”, $2; “lon”, $3; “lat”, $4; next}{print (NR-1), $0}{close($1".csv")}{print > ($1".csv")}' 2015full.csv
but they are not in an order that creates a working command for my purposes. Does anyone have a suggestion for me? Thanks!
Upvotes: 4
Views: 636
Reputation: 1399
if not using awk
for i in $(cut -d ' ' -f1 2015full.csv|uniq);do grep -w $i 2015full.csv|nl -w1 -s ' ' |sed "1i num date time lon lat" >$i.csv; done
Upvotes: 0
Reputation: 203254
awk '
BEGIN { hdr = "num" OFS "date" OFS "time" OFS "lon" OFS "lat" }
$1!=prev { close(out); out=$1".csv"; print hdr > out; idx=0; prev=$1 }
{ print ++idx, $0 > out }
' 2015full.csv
Upvotes: 0
Reputation: 133458
Though you are closing the files in backend but when $1
's value will change at that time previous $1
's valued .csv
file will be still opened in backend so rather than closing them on each line, try following and let me know if this helps you(considering that your Input_file is sorted as per shown sample, if not then we have to sort it by 1st column and pipe it to awk
command).
awk -F' ' 'prev!=$1{close(prev".csv")}{print > ($1".csv");prev=$1}' 2015full.csv
EDIT: I could see you need a header in each $1
output file, if so then following code may help you in same too.
awk -F' ' 'prev!=$1{close(prev".csv");print "num date time lon lat" > ($1".csv")}{print > ($1".csv");prev=$1}' 2015full.csv
Upvotes: 3