Reputation: 29
I have a 3 column file and I want to find the maximum value of the third column with rows with same first column and have also the second column in output.
Input:
1 234 0.005
1 235 0.060
1 236 0.001
2 234 0.010
2 235 0.003
2 236 0.003
3 234 0.004
3 235 0.100
3 236 0.004
Desired output:
1 235 0.060
2 234 0.010
3 235 0.100
I found this hint from previous questions but I do not know how to have also the second column:
!($1 in max) || $3>max[$1] { max[$1] = $3 }
END {
PROCINFO["sorted_in"] = "@ind_num_asc"
for (key in max) {
print key, max[key]
}
}
Upvotes: 2
Views: 289
Reputation: 204628
$ sort -k1,1n -k3,3nr file | awk '!seen[$1]++'
1 235 0.060
2 234 0.010
3 235 0.100
Upvotes: 0
Reputation: 46896
This should work in any modern awk (not just GNU):
$ awk '!a[$1]||$3>b[$1]{a[$1]=$0;b[$1]=$3} END {for(i in a)print a[i]}' file | sort -n
Broken out for easier reading:
!a[$1] || $3>b[$1]
- If we haven't seen the first column before OR the third column beats our previous record,{a[$1]=$0;b[$1]=$3}
- then store the current line in one array, and the comparison value in another array.END {for(i in a)print a[i]}
- Once we've processed all the input, print each line from our storage array.sort -n
- sort numerically. Should work with any sort of sort
.Clear as mud?
This solution notably stores entire lines ($0
) rather than the content of individual fields, so its output will be the input lines, rather than recreate the input lines. This might be useful if you're comfortable with default field splitting for the sake of collecting fields for comparison, but you have columnated or tabbed input that you want your output to match.
Upvotes: 0
Reputation: 37464
$ sort -k1n -k3nr file | uniq -w 1
1 235 0.060
2 234 0.010
3 235 0.100
Use sort
to sort on fields 1 and 3, 3 reversed. Then use uniq
and compare only 1st char.
Another one using GNU awk:
$ awk '{
a[$1][$3]=$0 }
END {
PROCINFO["sorted_in"]="@ind_num_asc" # first for in ascending order
for(i in a) {
PROCINFO["sorted_in"]="@ind_num_desc" # next for in descending
for(j in a[i]) {
print a[i][j]
break
}
}
}' file
1 235 0.060
2 234 0.010
3 235 0.100
Upvotes: 1
Reputation: 133770
Could you please try following. It should give the output in same order of Input_file's input sequence.
awk '
!a[$1]++{
b[++count]=$1
}
{
c[$1]=(c[$1]>$NF?c[$1]:$NF)
d[$1]=(c[$1]>$NF?d[$1]:$1 OFS $2)
}
END{
for(i=1;i<=count;i++){
print d[b[i]],c[b[i]]
}
}' Input_file
Output will be as follows.
1 235 0.060
2 234 0.010
3 235 0.100
Explanation: Adding explanation for above code too here.
awk '
!a[$1]++{ ##Checking condition if array named a has NO occurrence of $1 in it then do following.
b[++count]=$1 ##Create array b whose index is variable count with increasing value of 1 each time value is $1 for it.
}
{
c[$1]=(c[$1]>$NF?c[$1]:$NF) ##Creating array c value index is $1 and checking if $NF value is greater then its value then change it to $NF else no change.
d[$1]=(c[$1]>$NF?d[$1]:$1 OFS $2) ##Creating array d value index is $1 and checking if $NF value is greater then its value then change it to $NF else no change.
}
END{ ##Starting end block of awk program here.
for(i=1;i<=count;i++){ ##Starting for loop here from i value 1 to till value of count.
print d[b[i]],c[b[i]] ##Printing value of array d whose index is value of b[i] and array c whose index is b[i].
}
}' Input_file ##mentioning Input_file name here.
Upvotes: 0
Reputation: 786241
You may use this awk
:
awk '!($1 in max) || $3 > max[$1] { max[$1] = $3; two[$1] = $2 }
END { PROCINFO["sorted_in"] = "@ind_num_asc"
for (i in max) print i, two[i], max[i]
}' file
1 235 0.060
2 234 0.010
3 235 0.100
Upvotes: 1