Reputation: 13
My data looks like:
1 161 a
1 48 b
1 30 c
2 199 d
2 1306 d
2 1290 d
2 81 d
2 51 e
2 82 e
2 93 e
3 64 e
3 99 f
3 123 f
For every value in column three, I want to select row having maximum value in column 2 and group by column 1. So the desired output is
1 161 a
1 48 b
1 30 c
2 1306 d
2 93 e
3 64 e
3 123 f
Grouping by column one I mean I want results for all the distinct values in column 1. What I tried using is
awk '$2>max[$3]{max[$3]=$2; row[$3]=$0} END{for (i in row) print row[i]}' file
I get the following result
1 161 a
1 48 b
1 30 c
2 1306 d
2 93 e
3 123 f
But I am missing the following row in the result
3 64 e
This is because this one liner is selecting just the maximum value in column 2 for the distinct values in column 3, How do I group this by column 1 to get the desired result for each value in column 1?
Upvotes: 1
Views: 3791
Reputation: 9622
You can do this without awk
. I had to use tr -s " "
to suppress multiple whitespace separators between the fields/columns. The important line below is the second one.
cat $file | tr -s " " \
| sort -k2nr,2 | sort -k1,1 -k3,3 -u | sort -k3,3 -k1,1 -k2nr,2 \
| tr " " "\t"
Actually, the -k2nr,2
at the very end is not even necessary. You could just retain the initial sort order with --stable
instead. All roads lead to Rome.
Upvotes: 0
Reputation: 21
The easiest command to find the maximum and minimum value in the second column is something like this, respectively
sort -nrk2 data.txt | awk 'NR==1{print $2}' sort -nk2 data.txt | awk 'NR==1{print $2}'
Upvotes: 0
Reputation: 85530
Credits to sofan's
idea over in the comments, with some extra manipulations the below logic will do the trick as the OP wants.
sort -r -n -k2 < file| awk '!x[$1 FS $3]++' | sort -k1
sort -n -k2
) and reverse it (-r
) on column 2 (which now will be descending order)awk '!x[$1 FS $3]++
does the grouping of that content unique by column 1 and 3 and sort -k1
sorts/groups it by column 1 contentsUpvotes: 2
Reputation: 47099
I think you can just use a composite key of $1$3
:
awk '$2 > max[$1$3] { max[$1$3]=$2;
row[$1$3]=$0 }
END { for (i in row) print row[i] }' file
Upvotes: 5