Noz.i.kor
Noz.i.kor

Reputation: 3747

bash script: find max, group by and sort by column

I have a file that looks like this:

b, 20, 434
a, 20, 23
a, 10, 123
a, 20, 423
a, 10, 89
b, 20, 88
b, 10, 133
b, 10, 99
  1. Find the max of col 3 for every unique combination of col1 and col2. (e.g. max of col3 for all a,10)
  2. Group the output by col1 (all a rows together)
  3. Sort the output by col2.

That is, the output should be file should be

a, 10, 123
a, 20, 423
b, 10, 133
b, 20, 434

How can I do this in a bash script? Thanks for your help.

Upvotes: 3

Views: 3314

Answers (2)

Gismo Ranas
Gismo Ranas

Reputation: 6442

This does the job:

< input sort -k3,3gr | sort -k1,1 -k2,2g -u

It sorts numerically in reverse order on the third field and then sorts on the first and second field taking just the first occurrence (-u for unique).

You do not need padding, i.e. if you add to the input a line like

a, 3, 31

The output is:

a, 3, 31
a, 10, 123
a, 20, 423
b, 10, 133
b, 20, 434

Upvotes: 5

William Pursell
William Pursell

Reputation: 212238

This will slightly modify the whitespace, but perhaps that is acceptable:

awk '$3 > a[$1,$2] { a[$1,$2] = $3 } END {for( k in a) print k a[k]}' input |
    sort -n -t, -k1,1 -k2,2

But that solution is highly dependent on the whitespace in the input, so it would probably be better to do something like:

awk '$3 > a[$1","$2] { a[$1","$2] = $3 } 
    END {for( k in a) print k "," a[k]}' FS=, input |
    sort -n -t, -k1,1 -k2,2

Upvotes: 2

Related Questions