user2245731
user2245731

Reputation: 549

select lines with duplicate columns by a specific value

I have an input like this:

LineA parameter1 parameter2 56  
LineB parameter1 parameter2 87
LineB parameter1 parameter2 56
LineB parameter1 parameter2 90
LineC parameter1 parameter2 40  

I want to print each line but, if the first column ($1) is duplicated, only print the line with the highest value in the last column ($4).

So the output should look like this:

LineA parameter1 parameter2 56
LineB parameter1 parameter2 90
LineC ...

Upvotes: 0

Views: 134

Answers (3)

jaypal singh
jaypal singh

Reputation: 77185

Another way in awk:

awk '
fld1!=$1 && NR>1 {print line}
fld1==$1 {line=(fld4>$4)?line:$0;next}
{line=$0;fld1=$1;fld4=$4;next}
END{print line}' file

Upvotes: 0

captcha
captcha

Reputation: 3756

Code for GNU :

awk 'BEGIN {SUBSEP=OFS} $4>a[$1,$2,$3] {a[$1,$2,$3]=$4} END {for (i in a) {print i,a[i]}}' file

Upvotes: 2

iruvar
iruvar

Reputation: 23394

Try the below(assuming field 4 is >= 0 throughout)

Array b is used to track the highest value in field 4 for unique values in field 1. Array a (keyed by field 1) contains the corresponding record. As each record is processed, the record is added to array a and field 4 is added to array b if 1. a value is encountered in field 1 for the first time or 2. the value in field 4 exceeds the existing value in b for the value in field 1. Finally, array a is printed out.

 awk '$4 > b[$1] {a[$1] = $0; b[$1] = $4}
 END{for (x in a) {print a[x]}}'

Upvotes: 2

Related Questions