Reputation: 17
I have a list with 2 columns. In some rows I have the same values in the first column. In this case I want to remove the rows with lower values in the second column. example: input:
1 10
2 20
3 15
3 5
3 35
4 20
output:
1 10
2 20
3 35
4 20
Upvotes: 0
Views: 87
Reputation: 290015
Yes, sure:
$ awk '{a[$1]=(a[$1]<$2?$2:a[$1])} END {for (i in a) print i, a[i]}' file
1 10
2 20
3 35
4 20
Just keep populating the array a[]
with the maximum value of column 2 for a given column 1. Finally, print the result.
This uses the fact that a value defaults to 0
in awk. But it would fail if all values for a certain index were always negative or zero. For this, we have to improve the script a little bit by checking also if the specific index of the array exists:
awk '{a[$1]=(($1 in a) && a[$1]>$2?a[$1]:$2)}
END {for (i in a) print i, a[i]}' file
$ cat a
1 10
2 20
3 -15
3 -5
3 -35
4 20
$ awk '{a[$1]=(($1 in a) && a[$1]>$2?a[$1]:$2)} END {for (i in a) print i, a[i]}' a
1 10
2 20
3 -5
4 20
Upvotes: 2