Reputation: 6841
I have a tab delimited file that I would like to split into smaller files based off of two columns. My data looks like the following:
360.40 hockey james april expensive 1200.00 hockey james may expensive 124.33 baseball liam april cheap 443.12 soccer john may moderate
I want to parse these rows by the third and fifth columns.
The end result would be three different files named after the third and fifth columns like this:
james-expensive.tsv liam-cheap.tsv john-moderate.tsv
In each of those files I want only the first value in the row associated with that name/expense type. So in james-expensive.tsv for exmaple,the file would contain one column:
360.40
1200.00
I thought maybe some sort of awk script or sed script may be able to solve this, but I'm not quite sure where to start.
If it seems like a bad idea to do this with either awk or sed, that would help to know too.
Upvotes: 1
Views: 837
Reputation: 75458
Using awk
:
awk '{ print $1 > $3 "-" $5 ".tsv" }' your_file
Result:
$ for F in *.tsv; do echo "---- $F ----"; cat "$F"; done
---- james-expensive.tsv ----
360.40
1200.00
---- john-moderate.tsv ----
443.12
---- liam-cheap.tsv ----
124.33
Update for nawk
:
awk '{ f = $3 "-" $5 ".tsv"; print $1 > f }' your_file
Prevent too many opened files:
awk '{ f = $3 "-" $5 ".tsv" } !a[f]++ { printf "" > f } { print $1 >> f; close(f) }' your_file
Upvotes: 2
Reputation: 5805
You didn't tag perl but here is a oneliner:
perl -lane '`echo "$F[0]" >> $F[2]-$F[4].tsv`' file
Upvotes: 0