Kay
Kay

Reputation: 2077

Find the maximum values in 2nd column for each distinct values in 1st column using Linux

I have two columns as follows

ifile.dat
1   10
3   34
1   4
3   32
5   3
2   2
4   20
3   13
4   50
1   40
2   20

What I look for is to find the maximum values in 2nd column for each 1,2,3,4,5 in 1st column.

ofile.dat
1   40 
2   20
3   34
4   50
5   3

I found someone has done this using other program e.g. Get the maximum values of column B per each distinct value of column A

Upvotes: 2

Views: 3010

Answers (5)

Mojtaba
Mojtaba

Reputation: 21

The easiest command to find the maximum value in the second column is something like this

sort -nrk2 data.txt | awk 'NR==1{print $2}'

Upvotes: 2

Frode F
Frode F

Reputation: 31

Another way is using sort. First numeric sort on column 2 decreasing and then remove non unique values of column 1, a one-liner:

sort -n -r -k 2  ifile.dat| sort -u -n -k 1

Upvotes: 3

RavinderSingh13
RavinderSingh13

Reputation: 133770

Considering that your 1st field will be starting from 1 if yes then try one more solution in awk also.

awk '{a[$1]=$2>a[$1]?$2:(a[$2]?a[$2]:$2);} END{for(j=1;j<=length(a);j++){if(a[j]){print j,a[j]}}}'   Input_file

Adding one more way for same too here.

sort -k1 Input_file | awk 'prev != $1 && prev{print prev, val;val=prev=""} {val=val>$2?val:$2;prev=$1} END{print prev,val}'

Upvotes: 1

Ed Morton
Ed Morton

Reputation: 204638

When doing min/max calculations, always seed the min/max variable using the first value read:

$ cat tst.awk
!($1 in max) || $2>max[$1] { max[$1] = $2 }
END {
    PROCINFO["sorted_in"] = "@ind_num_asc"
    for (key in max) {
        print key, max[key]
    }
}

$ awk -f tst.awk file
1 40
2 20
3 34
4 50
5 3

The above uses GNU awk 4.* for PROCINFO["sorted_in"] to control output order, see http://www.gnu.org/software/gawk/manual/gawk.html#Controlling-Array-Traversal.

Upvotes: 1

Pankrates
Pankrates

Reputation: 3095

awk seems a prime candidate for this task. Simply traverse your input file and keep an array indexed by the first column values and storing a value of column 2 if it is larger than the currently stored value. At the end of the traversal iterate over the array to print indices and corresponding values

awk '{
    if (a[$1] < $2) {
        a[$1]=$2
    }
} END {
    for (i in a) {
        print i, a[i]
    }
}' ifile.dat

Now the result will not be sorted numerically on the first column but that should be easily fixable if that is required

Upvotes: 5

Related Questions