Reputation: 28329
I need some help with text manipulation.
I do have data like this:
29554 31109 "ENSG00000243485.1" 1555
29554 31097 "ENSG00000243485.1" 1543
29554 30039 "ENSG00000243485.1" 485
30564 30667 "ENSG00000243485.1" 103
30267 30667 "ENSG00000243485.1" 400
30976 31109 "ENSG00000243485.1" 133
89295 133566 "ENSG00000238009.2" 44271
89295 120932 "ENSG00000238009.2" 31637
120775 120932 "ENSG00000238009.2" 157
112700 112804 "ENSG00000238009.2" 104
92091 92240 "ENSG00000238009.2" 149
28269867 28269929 "ENSG00000248451.1" 62
28270383 28270486 "ENSG00000248451.1" 103
28273195 28273372 "ENSG00000248451.1" 177
28275308 28275354 "ENSG00000248451.1" 46
.....................
I have to print the line with the biggest value per group.
There is group name in column 4 and values are in column 5.
As I imagine it should go like this:
1. Separating groups from each other;
2. Selecting biggest value;
3. Printing the whole line.
Preferred output for the example should be:
29554 31109 "ENSG00000243485.1" 1555
89295 133566 "ENSG00000238009.2" 44271
28273195 28273372 "ENSG00000248451.1" 177
Hope someone could help me with this in awk or sed.
Upvotes: 0
Views: 76
Reputation: 4384
This should do in bash
and awk
:
GROUPS=$(cut -d' ' -f3 datafile | uniq) # list of groups
for f in "$GROUPS"
do
# print line if 4th field is max
awk -v "grp=$f" '$0 ~ grp && $4 > max {max=$4; line=$0} END {print line}' datafile
done
Upvotes: 1
Reputation: 246774
You only need to pass through the file once with awk:
awk '
$4 > val[$3] {val[$3] = $4; line[$3] = $0}
END {for (grp in line) print line[grp]}
' filename
Upvotes: 2
Reputation: 58371
This might work for you:
cat -n file | sort -k4,4 -k5,5nr | sort -u -k4,4 | sort -n | cut -f2-
Upvotes: 1