Reputation: 11
I've a CSV file like this:
file1
field1 field2 field3 field4
000000 nodata NP1212 1.1212
000000 212344 NP1212 1.1232
000000 343423 NX3212 1.2342
000000 542346 NX3212 1.6345
000000 nodata NZ3244 1.0345
It is tab delimited, with about 70000 rows, with 23 field and sorted by field3. How can I split it by rows to create this kind of outputs?
file1_NP1212
field1 field2 field3 field4
000000 nodata NP1212 1.1212
000000 212344 NP1212 1.1232
file1_NX3212
field1 field2 field3 field4
000000 343423 NX3212 1.2342
000000 542346 NX3212 1.6345
file1_NZ3244
field1 field2 field3 field4
000000 nodata NZ3244 1.0345
Upvotes: 1
Views: 189
Reputation: 785156
This awk
command should do the trick:
awk 'NR==1{hdr=$0; next}
{fn="file1_" $3; if (p != $3) {close(p); p=$3; print hdr > fn} print > fn}
END {close(p)}' file1
Upvotes: 2
Reputation: 189397
If you want to keep the header line in every output file and be robust against running out of file handles, try this.
awk -F '\t' 'NR == 1 { header=$0; next }
$3 != prev { close(handle); prev = $3; handle = "file1_" $3; print header >handle }
{ print >handle }' file1
This assumes the input is already sorted on field 3; if not, you will end up overwriting files with partial output.
Upvotes: 0