Split file based off of two columns in bash

Question

I have a tab delimited file that I would like to split into smaller files based off of two columns. My data looks like the following:

    360.40   hockey   james  april  expensive
    1200.00  hockey   james  may    expensive
    124.33   baseball liam   april  cheap
    443.12   soccer   john   may    moderate

I want to parse these rows by the third and fifth columns.

The end result would be three different files named after the third and fifth columns like this:

james-expensive.tsv liam-cheap.tsv john-moderate.tsv

In each of those files I want only the first value in the row associated with that name/expense type. So in james-expensive.tsv for exmaple,the file would contain one column:

360.40

1200.00

I thought maybe some sort of awk script or sed script may be able to solve this, but I'm not quite sure where to start.

If it seems like a bad idea to do this with either awk or sed, that would help to know too.

konsolebox · Accepted Answer

Using awk:

awk '{ print $1 > $3 "-" $5 ".tsv" }' your_file

Result:

$ for F in *.tsv; do echo "---- $F ----"; cat "$F"; done
---- james-expensive.tsv ----
360.40
1200.00
---- john-moderate.tsv ----
443.12
---- liam-cheap.tsv ----
124.33

Update for nawk:

awk '{ f = $3 "-" $5 ".tsv"; print $1 > f }' your_file

Prevent too many opened files:

awk '{ f = $3 "-" $5 ".tsv" } !a[f]++ { printf "" > f } { print $1 >> f; close(f) }' your_file

Split file based off of two columns in bash

Answers (2)

Related Questions