Reputation: 2077
I have two columns as follows
ifile.dat
1 10
3 34
1 4
3 32
5 3
2 2
4 20
3 13
4 50
1 40
2 20
What I look for is to find the maximum values in 2nd column for each 1,2,3,4,5 in 1st column.
ofile.dat
1 40
2 20
3 34
4 50
5 3
I found someone has done this using other program e.g. Get the maximum values of column B per each distinct value of column A
Upvotes: 2
Views: 3010
Reputation: 21
The easiest command to find the maximum value in the second column is something like this
sort -nrk2 data.txt | awk 'NR==1{print $2}'
Upvotes: 2
Reputation: 31
Another way is using sort. First numeric sort on column 2 decreasing and then remove non unique values of column 1, a one-liner:
sort -n -r -k 2 ifile.dat| sort -u -n -k 1
Upvotes: 3
Reputation: 133770
Considering that your 1st field will be starting from 1 if yes then try one more solution in awk also.
awk '{a[$1]=$2>a[$1]?$2:(a[$2]?a[$2]:$2);} END{for(j=1;j<=length(a);j++){if(a[j]){print j,a[j]}}}' Input_file
Adding one more way for same too here.
sort -k1 Input_file | awk 'prev != $1 && prev{print prev, val;val=prev=""} {val=val>$2?val:$2;prev=$1} END{print prev,val}'
Upvotes: 1
Reputation: 204638
When doing min/max calculations, always seed the min/max variable using the first value read:
$ cat tst.awk
!($1 in max) || $2>max[$1] { max[$1] = $2 }
END {
PROCINFO["sorted_in"] = "@ind_num_asc"
for (key in max) {
print key, max[key]
}
}
$ awk -f tst.awk file
1 40
2 20
3 34
4 50
5 3
The above uses GNU awk 4.* for PROCINFO["sorted_in"]
to control output order, see http://www.gnu.org/software/gawk/manual/gawk.html#Controlling-Array-Traversal.
Upvotes: 1
Reputation: 3095
awk
seems a prime candidate for this task. Simply traverse your input file and keep an array indexed by the first column values and storing a value of column 2 if it is larger than the currently stored value. At the end of the traversal iterate over the array to print indices and corresponding values
awk '{
if (a[$1] < $2) {
a[$1]=$2
}
} END {
for (i in a) {
print i, a[i]
}
}' ifile.dat
Now the result will not be sorted numerically on the first column but that should be easily fixable if that is required
Upvotes: 5