Reputation: 375
I have a TAB delimited table like this (the first line is header):
symbol value chr start end
Arrb1 10 chr1 1000 2000
Arrb1 20 chr1 1000 2000
Arrb1 30 chr1 1000 2000
Myc 5 chr2 3000 4000
Actin 3 chr4 25000 30000
Actin 5 chr4 25000 30000
.
.
.
I want to unique the table by the first column(symbol), and if there are multiple lines for the same symbol, keep the line with biggest value (column 2). So the result should look like:
symbol value chr start end
Arrb1 30 chr1 1000 2000
Myc 5 chr2 3000 4000
Actin 5 chr4 25000 30000
.
.
.
Can I do it using AWK? Thanks!
Upvotes: 0
Views: 118
Reputation: 45243
If no header. I provide a shorter one.
sort -k1,1 -k2,2nr file |awk '!a[$1]++'
Upvotes: 1
Reputation: 195079
awk -F'\t' 'NR==1{print}
NR>1{if(b[$1]<$2){ a[$1]=$0; b[$1]=$2 }}
END{for(x in a)print a[x]}' file
Upvotes: 1