Reputation: 13

how to select rows with max value based on one column and group by second column using awk?

My data looks like:

1   161   a
1   48    b
1   30    c
2   199   d 
2   1306  d
2   1290  d
2   81    d
2   51    e
2   82    e
2   93    e
3   64    e
3   99    f
3   123   f

For every value in column three, I want to select row having maximum value in column 2 and group by column 1. So the desired output is

1   161   a
1   48    b
1   30    c
2   1306  d
2   93    e
3   64    e
3   123   f

Grouping by column one I mean I want results for all the distinct values in column 1. What I tried using is

awk '$2>max[$3]{max[$3]=$2; row[$3]=$0} END{for (i in row) print row[i]}' file

I get the following result

1   161   a
1   48    b
1   30    c
2   1306  d
2   93    e
3   123   f

But I am missing the following row in the result

3   64  e

This is because this one liner is selecting just the maximum value in column 2 for the distinct values in column 3, How do I group this by column 1 to get the desired result for each value in column 1?

Upvotes: 1

Answers (4)

tommy.carstensen

Reputation: 9622

You can do this without awk. I had to use tr -s " " to suppress multiple whitespace separators between the fields/columns. The important line below is the second one.

cat $file | tr -s " " \
| sort -k2nr,2 | sort -k1,1 -k3,3 -u | sort -k3,3 -k1,1 -k2nr,2 \
| tr " " "\t"

Actually, the -k2nr,2 at the very end is not even necessary. You could just retain the initial sort order with --stable instead. All roads lead to Rome.

Upvotes: 0

Mojtaba

Reputation: 21

The easiest command to find the maximum and minimum value in the second column is something like this, respectively

sort -nrk2 data.txt | awk 'NR==1{print $2}' sort -nk2 data.txt | awk 'NR==1{print $2}'

Upvotes: 0

Inian

Reputation: 85895

Credits to sofan's idea over in the comments, with some extra manipulations the below logic will do the trick as the OP wants.

sort -r -n -k2 < file| awk '!x[$1 FS $3]++' | sort -k1

Idea is to sort the file in ascending order first (sort -n -k2) and reverse it (-r) on column 2 (which now will be descending order)
awk '!x[$1 FS $3]++ does the grouping of that content unique by column 1 and 3 and sort -k1 sorts/groups it by column 1 contents

Upvotes: 2

Andreas Louv

Reputation: 47127

I think you can just use a composite key of $1$3:

awk '$2 > max[$1$3] { max[$1$3]=$2;
                      row[$1$3]=$0 }
                END { for (i in row) print row[i] }' file

Upvotes: 5

how to select rows with max value based on one column and group by second column using awk?

Answers (4)

Related Questions