Reputation: 55
I have multiple files with some pattern
ABCD 100
ABCD 200
EFGH 500
IJKL 50
EFGH 700
ABCD 800
IJKL 100
I want match the occurrence of each (ABCD/EFGH/IJKL) only once sorted based on highest numbers in column 2
ABCD 800
EFGH 700
IJKL 100
I tried cat *txt | sort -k 1 |
??
thanks in Advance
My bad, for not being explicit. Apologies for wasting your time. Below is detailed example. The file has multiple columns. I got the one's needed using awk and tried this cat *txt |awk '{print $3,$5}' | sort -gr |less. Now I got the strings sorted based on numeral value. Now how do I get the uniq string for the first match.
<string> <numeral>
abcde/efgh/ijkl/mnop -450.00
dfgh/adas/gfda/adasd -100.0
abcde/efgh/ijkl/mnop -100.00
lk/oiojl/ojojl -0.078
dfgh/adas/gfda/adasd 50.0
lk/oiojl/ojojl -0.150
O/p needed
abcde/efgh/ijkl/mnop -450.00
dfgh/adas/gfda/adasd -100.0
lk/oiojl/ojojl -0.150
Upvotes: 4
Views: 223
Reputation: 26121
perl -anE'$h{$F[0]}=$F[1]if!exists$h{$F[0]}or$F[1]>$h{$F[0]}}{say"$_ $h{$_}"for keys%h'
Upvotes: 0
Reputation: 531275
You can use sort
twice: once to sort on the numbers, a second time to do a stable sort on the strings (so that the largest number remains first), removing duplicates to discard duplicate strings with smaller numbers.
sort -k2,2nr file.txt | sort -k1,1 -u --stable
Upvotes: 3
Reputation: 12123
If the first column is always 4 characters, then (per abasu's suggestion) you can use uniq -w4
.
cat *.txt | sort -gr | uniq -w4
This sorts in reverse order numerically, ('ABCD 800' will precede 'ABCD 100') and only considers the first 4 characters when finding unique rows.
If the first column is not always 4 characters, you can pipe back and forth to rev
, and use uniq -f1
to skip the first of the reversed fields.
cat *.txt | sort -gr | rev | uniq -f1 | rev
If you want to target a specific word, and get the highest corresponding number, you can use
cat *.txt | sort -gr | grep 'ABCD' | head -n 1
Upvotes: 1
Reputation: 50647
cat *txt | perl -ane 'END{print "$_ $r{$_}\n" for sort keys %r} $_<$F[1] and $_=$F[1] for $r{$F[0]}'
Upvotes: 2
Reputation: 121397
You can use awk's associate array and then sort based on column 2:
awk '{ if ($2>arr[$1]) arr[$1]=$2} END{for (i in arr) print i, arr[i]}' file \
| sort -k2 -rn
Upvotes: 2