bash script: find max, group by and sort by column

Question

I have a file that looks like this:

b, 20, 434
a, 20, 23
a, 10, 123
a, 20, 423
a, 10, 89
b, 20, 88
b, 10, 133
b, 10, 99

Find the max of col 3 for every unique combination of col1 and col2. (e.g. max of col3 for all a,10)
Group the output by col1 (all a rows together)
Sort the output by col2.

That is, the output should be file should be

a, 10, 123
a, 20, 423
b, 10, 133
b, 20, 434

How can I do this in a bash script? Thanks for your help.

William Pursell · Accepted Answer

This will slightly modify the whitespace, but perhaps that is acceptable:

awk '$3 > a[$1,$2] { a[$1,$2] = $3 } END {for( k in a) print k a[k]}' input |
    sort -n -t, -k1,1 -k2,2

But that solution is highly dependent on the whitespace in the input, so it would probably be better to do something like:

awk '$3 > a[$1","$2] { a[$1","$2] = $3 } 
    END {for( k in a) print k "," a[k]}' FS=, input |
    sort -n -t, -k1,1 -k2,2

Answers (2)