Reputation: 28329

Select the biggest value and print the line

I need some help with text manipulation.
I do have data like this:

29554 31109 "ENSG00000243485.1" 1555
29554 31097 "ENSG00000243485.1" 1543
29554 30039 "ENSG00000243485.1" 485
30564 30667 "ENSG00000243485.1" 103
30267 30667 "ENSG00000243485.1" 400
30976 31109 "ENSG00000243485.1" 133
89295 133566 "ENSG00000238009.2" 44271
89295 120932 "ENSG00000238009.2" 31637
120775 120932 "ENSG00000238009.2" 157
112700 112804 "ENSG00000238009.2" 104
92091 92240 "ENSG00000238009.2" 149
28269867 28269929 "ENSG00000248451.1" 62
28270383 28270486 "ENSG00000248451.1" 103
28273195 28273372 "ENSG00000248451.1" 177
28275308 28275354 "ENSG00000248451.1" 46
.....................

I have to print the line with the biggest value per group.
There is group name in column 4 and values are in column 5.
As I imagine it should go like this:
1. Separating groups from each other;
2. Selecting biggest value;
3. Printing the whole line.

Preferred output for the example should be:

29554 31109 "ENSG00000243485.1" 1555
89295 133566 "ENSG00000238009.2" 44271
28273195 28273372 "ENSG00000248451.1" 177

Hope someone could help me with this in awk or sed.

Upvotes: 0

Answers (3)

Stephane Rouberol

Reputation: 4384

This should do in bash and awk:

GROUPS=$(cut -d' ' -f3 datafile | uniq) # list of groups
for f in "$GROUPS"
do 
  # print line if 4th field is max
  awk -v "grp=$f" '$0 ~ grp && $4 > max {max=$4; line=$0} END {print line}' datafile
done

Upvotes: 1

glenn jackman

Reputation: 246774

You only need to pass through the file once with awk:

awk '
    $4 > val[$3] {val[$3] = $4; line[$3] = $0} 
    END {for (grp in line) print line[grp]}
' filename

Upvotes: 2

potong

Reputation: 58371

This might work for you:

cat -n file | sort -k4,4 -k5,5nr | sort -u -k4,4 | sort -n | cut -f2-

Upvotes: 1

Select the biggest value and print the line

Answers (3)

Related Questions