Reputation: 29

Find the maximum values in nth column for each distinct values in 1st column in bash

I have a 3 column file and I want to find the maximum value of the third column with rows with same first column and have also the second column in output.

Input:

1   234   0.005
1   235   0.060
1   236   0.001
2   234   0.010
2   235   0.003
2   236   0.003
3   234   0.004
3   235   0.100
3   236   0.004

Desired output:

1   235   0.060
2   234   0.010
3   235   0.100

I found this hint from previous questions but I do not know how to have also the second column:

!($1 in max) || $3>max[$1] { max[$1] = $3 }
END {
     PROCINFO["sorted_in"] = "@ind_num_asc"
     for (key in max) {
         print key, max[key]
         }
     }

Upvotes: 2

Answers (5)

Ed Morton

Reputation: 204628

$ sort -k1,1n -k3,3nr file | awk '!seen[$1]++'
1   235   0.060
2   234   0.010
3   235   0.100

Upvotes: 0

ghoti

Reputation: 46896

This should work in any modern awk (not just GNU):

$ awk '!a[$1]||$3>b[$1]{a[$1]=$0;b[$1]=$3} END {for(i in a)print a[i]}' file | sort -n

Broken out for easier reading:

!a[$1] || $3>b[$1] - If we haven't seen the first column before OR the third column beats our previous record,
{a[$1]=$0;b[$1]=$3} - then store the current line in one array, and the comparison value in another array.
END {for(i in a)print a[i]} - Once we've processed all the input, print each line from our storage array.
sort -n - sort numerically. Should work with any sort of sort.

Clear as mud?

This solution notably stores entire lines ($0) rather than the content of individual fields, so its output will be the input lines, rather than recreate the input lines. This might be useful if you're comfortable with default field splitting for the sake of collecting fields for comparison, but you have columnated or tabbed input that you want your output to match.

Upvotes: 0

James Brown

Reputation: 37464

$ sort -k1n -k3nr file | uniq -w 1
1   235   0.060
2   234   0.010
3   235   0.100

Use sort to sort on fields 1 and 3, 3 reversed. Then use uniq and compare only 1st char.

Another one using GNU awk:

$ awk '{
    a[$1][$3]=$0 }
END {   
    PROCINFO["sorted_in"]="@ind_num_asc"       # first for in ascending order
    for(i in a) {
        PROCINFO["sorted_in"]="@ind_num_desc"  # next for in descending
        for(j in a[i]) {
            print a[i][j]
            break
        }
    }
}' file
1   235   0.060
2   234   0.010
3   235   0.100

Upvotes: 1

RavinderSingh13

Reputation: 133770

Could you please try following. It should give the output in same order of Input_file's input sequence.

awk '
!a[$1]++{
  b[++count]=$1
}
{
  c[$1]=(c[$1]>$NF?c[$1]:$NF)
  d[$1]=(c[$1]>$NF?d[$1]:$1 OFS $2)
}
END{
  for(i=1;i<=count;i++){
    print d[b[i]],c[b[i]]
  }
}'  Input_file

Output will be as follows.

1 235 0.060
2 234 0.010
3 235 0.100

Explanation: Adding explanation for above code too here.

awk '
!a[$1]++{                              ##Checking condition if array named a has NO occurrence of $1 in it then do following.
  b[++count]=$1                        ##Create array b whose index is variable count with increasing value of 1 each time value is $1 for it.
}
{
  c[$1]=(c[$1]>$NF?c[$1]:$NF)          ##Creating array c value index is $1 and checking if $NF value is greater then its value then change it to $NF else no change.
  d[$1]=(c[$1]>$NF?d[$1]:$1 OFS $2)    ##Creating array d value index is $1 and checking if $NF value is greater then its value then change it to $NF else no change.
}
END{                                   ##Starting end block of awk program here.
  for(i=1;i<=count;i++){               ##Starting for loop here from i value 1 to till value of count.
    print d[b[i]],c[b[i]]              ##Printing value of array d whose index is value of b[i] and array c whose index is b[i].
  }
}' Input_file                          ##mentioning Input_file name here.

Upvotes: 0

anubhava

Reputation: 786241

You may use this awk:

awk '!($1 in max) || $3 > max[$1] { max[$1] = $3; two[$1] = $2 }
END { PROCINFO["sorted_in"] = "@ind_num_asc"
   for (i in max) print i, two[i], max[i]
}' file

1 235 0.060
2 234 0.010
3 235 0.100

Upvotes: 1

Find the maximum values in nth column for each distinct values in 1st column in bash

Answers (5)

Related Questions