Reputation: 549
I have an input like this:
LineA parameter1 parameter2 56
LineB parameter1 parameter2 87
LineB parameter1 parameter2 56
LineB parameter1 parameter2 90
LineC parameter1 parameter2 40
I want to print each line but, if the first column ($1
) is duplicated, only print the line with the highest value in the last column ($4
).
So the output should look like this:
LineA parameter1 parameter2 56
LineB parameter1 parameter2 90
LineC ...
Upvotes: 0
Views: 134
Reputation: 77185
Another way in awk
:
awk '
fld1!=$1 && NR>1 {print line}
fld1==$1 {line=(fld4>$4)?line:$0;next}
{line=$0;fld1=$1;fld4=$4;next}
END{print line}' file
Upvotes: 0
Reputation: 3756
Code for GNU awk:
awk 'BEGIN {SUBSEP=OFS} $4>a[$1,$2,$3] {a[$1,$2,$3]=$4} END {for (i in a) {print i,a[i]}}' file
Upvotes: 2
Reputation: 23394
Try the below(assuming field 4 is >= 0 throughout)
Array b
is used to track the highest value in field 4 for unique values in field 1. Array a
(keyed by field 1) contains the corresponding record. As each record is processed, the record is added to array a
and field 4 is added to array b
if
1. a value is encountered in field 1 for the first time or 2. the value in field 4 exceeds the existing value in b
for the value in field 1.
Finally, array a
is printed out.
awk '$4 > b[$1] {a[$1] = $0; b[$1] = $4}
END{for (x in a) {print a[x]}}'
Upvotes: 2