Runner
Runner

Reputation: 375

How to use AWK to unique a table (keep the biggest value for each unique ID)?

I have a TAB delimited table like this (the first line is header):

  symbol    value   chr start   end
  Arrb1 10  chr1    1000    2000
  Arrb1 20  chr1    1000    2000
  Arrb1 30  chr1    1000    2000
  Myc   5   chr2    3000    4000
  Actin 3   chr4    25000   30000
  Actin 5   chr4    25000   30000
    .
    .
    .

I want to unique the table by the first column(symbol), and if there are multiple lines for the same symbol, keep the line with biggest value (column 2). So the result should look like:

  symbol    value   chr start   end
  Arrb1 30  chr1    1000    2000
  Myc   5   chr2    3000    4000
  Actin 5   chr4    25000   30000
    .
    .
    .

Can I do it using AWK? Thanks!

Upvotes: 0

Views: 118

Answers (2)

BMW
BMW

Reputation: 45243

If no header. I provide a shorter one.

sort -k1,1 -k2,2nr file |awk '!a[$1]++'

Upvotes: 1

Kent
Kent

Reputation: 195079

awk -F'\t' 'NR==1{print}
 NR>1{if(b[$1]<$2){ a[$1]=$0; b[$1]=$2 }}
 END{for(x in a)print a[x]}' file

Upvotes: 1

Related Questions