Reputation: 65
I have a tab separated file with 5 columns. I want to print the entire line when the value is maximum in column 3 for entries listed in column 4. Example input:
Col1 Col2 Col3 Col4 Col5
123 456 22 NM01 +
675 564 21 NM01 +
342 789 12 NM01 +
231 452 56 NM02 +
456 783 178 NM02 +
Expected outcome:
123 456 22 NM01 +
456 783 178 NM02 +
Basically, print out the line when Col3 has the max value for entries in col4.
I am trying to use unix commands. I am new to this and not able to find examples of how to look up the max value in col3 for each case of col4 (ex: NM01, NM02..)
Upvotes: 2
Views: 682
Reputation: 124656
You can do this using two maps: 1 to track the maximum values per field $4
, and one to track the complete lines having the maximum values per field $4
:
awk '$3 > max[$4] { max[$4] = $3; m[$4] = $0 }
END { for (i in m) { print m[i] } }' file.txt
This will output:
Col1 Col2 Col3 Col4 Col5
123 456 22 NM01 +
456 783 178 NM02 +
If you don't want the header line, you can add a NR > 1
condition.
Upvotes: 5