SWK
SWK

Reputation: 47

AWK: print ALL rows with MAX value in one field Per the other field including Identical Rows with Max value

I am trying to keep the rows with highest value in column 2 per column 1 including identical rows with max value like the desired output below.

Data is

a   55
a   66
a   130
b   88
b   99
b   99
c   110
c   130
c   130

Desired output is

a   130
b   99
b   99
c   130
c   130

I could find great answers from this site, but not exactly for the current question.

awk '{ max=(max>$2?max:$2); arr[$2]=(arr[$2]?arr[$2] ORS:"")$0 } END{ print arr[max] }' file

yields the output which includes the identical rows But max value is from all rows not per column 1.

a       130
c       130
c       130
awk '$2>max[$1] {max[$1]=$2 ; row[$1]=$0} END{for (i in row) print row[i]}' file

Output includes the max value per column 1 but NOT include identical rows with max values.

a       130
b       99
c       130

Would you please help me to trim the data in desired way. Even all codes above are obtained from your questions and answers in this site. Appreciate that!! Many thanks for helps in advance!!!

Upvotes: 2

Views: 244

Answers (4)

Ed Morton
Ed Morton

Reputation: 204638

Using any awk:

awk '
    { cnt[$1,$2]++; max[$1]=$2 }
    END { for (key in max) { val=max[key]; for (i=1; i<=cnt[key,val]; i++) print key, val } }
' file
a 130
b 99
b 99
c 130
c 130

Upvotes: 3

Andre Wildberg
Andre Wildberg

Reputation: 19271

Another way using awk. The second loop should be light, just repeating the duplicated max values.

% awk 'arr[$1] < $2{arr[$1] = $2;                       # get max value
         co[$1]++; if(co[$1] == 1){x++; id[x] = $1}}    # count unique ids
       arr[$1] == $2{n[$1,arr[$1]]++}                   # count repeated max
       END{for(i=1; i<=x; i++){
             for(j=1; j<=n[id[i],arr[id[i]]]; j++){print id[i], arr[id[i]]}}}' file
a 130
b 99
b 99
c 130
c 130

or, if order doesn't matter

% awk 'arr[$1] < $2{arr[$1] = $2}
       arr[$1] == $2{n[$1,arr[$1]]++}
       END{for(i in arr){
             j=0; do{print i, arr[i]; j++} while(j < n[i,arr[i]])}}' file
c 130
c 130
b 99
b 99
a 130

-- EDIT --

Printing data in additional columns

% awk 'arr[$1] < $2{arr[$1] = $2}
       arr[$1] == $2{n[$1,arr[$1]]++; line[$1,arr[$1],n[$1,arr[$1]]] = $0}
       END{for(i in arr){
             j=0; do{j++; print line[i,arr[i],j]} while(j < n[i,arr[i]])}}' file
c   130 data8
c   130 data9
b   99  data5
b   99  data6
a   130 data3

Data

% cat file
a   55  data1
a   66  data2
a   130 data3
b   88  data4
b   99  data5
b   99  data6
c   110 data7
c   130 data8
c   130 data9

Upvotes: 2

dawg
dawg

Reputation: 104102

Here is a ruby to do that:

ruby -e '
grps=$<.read.split(/\R/).
    group_by{|line| line[/^\S+/]}
# {"a"=>["a   55", "a   66", "a   130"], "b"=>["b   88", "b   99", "b   99"], "c"=>["c   110", "c   130", "c   130"]}

maxes=grps.map{|k,v| v.max_by{|s| s.split[-1].to_f}}
# ["a   130", "b   99", "c   130"]

grps.values.flatten.each{|s| puts s if maxes.include?(s)}
' file  

Prints:

a   130
b   99
b   99
c   130
c   130

Upvotes: 2

jared_mamrot
jared_mamrot

Reputation: 26225

I've used this approach in the past:

awk 'NR==FNR{if($2 > max[$1]){max[$1]=$2}; next} max[$1] == $2' test.txt test.txt
a   130
b   99
b   99
c   130
c   130

This requires you to pass in the same file twice (i.e. awk '...' test.txt test.txt), so it's not ideal, but hopefully it provides the required output with your actual data.

Upvotes: 3

Related Questions