How can I remove the rows with the same first column and lower second column in awk?

Question

I have a list with 2 columns. In some rows I have the same values in the first column. In this case I want to remove the rows with lower values in the second column. example: input:

output:

fedorqui · Accepted Answer

Yes, sure:

$ awk '{a[$1]=(a[$1]<$2?$2:a[$1])} END {for (i in a) print i, a[i]}' file
1 10
2 20
3 35
4 20

Just keep populating the array a[] with the maximum value of column 2 for a given column 1. Finally, print the result.

This uses the fact that a value defaults to 0 in awk. But it would fail if all values for a certain index were always negative or zero. For this, we have to improve the script a little bit by checking also if the specific index of the array exists:

awk '{a[$1]=(($1 in a) && a[$1]>$2?a[$1]:$2)}
     END {for (i in a) print i, a[i]}' file

Test

$ cat a
1  10
2  20
3  -15
3  -5
3  -35
4  20
$ awk '{a[$1]=(($1 in a) && a[$1]>$2?a[$1]:$2)} END {for (i in a) print i, a[i]}'  a
1 10
2 20
3 -5
4 20

How can I remove the rows with the same first column and lower second column in awk?

Answers (1)

Test

Related Questions