Reputation: 55

Need to match a pattern occurrence only once in a file

I have multiple files with some pattern

ABCD  100
ABCD   200
EFGH    500
IJKL      50
EFGH    700
ABCD    800
IJKL    100

I want match the occurrence of each (ABCD/EFGH/IJKL) only once sorted based on highest numbers in column 2

ABCD   800
EFGH    700
IJKL    100

I tried cat *txt | sort -k 1 |??

thanks in Advance

My bad, for not being explicit. Apologies for wasting your time. Below is detailed example. The file has multiple columns. I got the one's needed using awk and tried this cat *txt |awk '{print $3,$5}' | sort -gr |less. Now I got the strings sorted based on numeral value. Now how do I get the uniq string for the first match.

<string>                <numeral>
abcde/efgh/ijkl/mnop    -450.00
dfgh/adas/gfda/adasd    -100.0
abcde/efgh/ijkl/mnop     -100.00
lk/oiojl/ojojl           -0.078
dfgh/adas/gfda/adasd   50.0
lk/oiojl/ojojl       -0.150
O/p needed
abcde/efgh/ijkl/mnop     -450.00
dfgh/adas/gfda/adasd    -100.0
lk/oiojl/ojojl       -0.150

Upvotes: 4

Answers (5)

Hynek -Pichi- Vychodil

Reputation: 26121

perl -anE'$h{$F[0]}=$F[1]if!exists$h{$F[0]}or$F[1]>$h{$F[0]}}{say"$_ $h{$_}"for keys%h'

Upvotes: 0

chepner

Reputation: 531275

You can use sort twice: once to sort on the numbers, a second time to do a stable sort on the strings (so that the largest number remains first), removing duplicates to discard duplicate strings with smaller numbers.

sort -k2,2nr file.txt | sort -k1,1 -u --stable

Upvotes: 3

ktm5124

Reputation: 12123

If the first column is always 4 characters, then (per abasu's suggestion) you can use uniq -w4.

cat *.txt | sort -gr | uniq -w4

This sorts in reverse order numerically, ('ABCD 800' will precede 'ABCD 100') and only considers the first 4 characters when finding unique rows.

If the first column is not always 4 characters, you can pipe back and forth to rev, and use uniq -f1 to skip the first of the reversed fields.

cat *.txt | sort -gr | rev | uniq -f1 | rev

If you want to target a specific word, and get the highest corresponding number, you can use

cat *.txt | sort -gr | grep 'ABCD' | head -n 1

Upvotes: 1

mpapec

Reputation: 50647

cat *txt | perl -ane 'END{print "$_ $r{$_}\n" for sort keys %r} $_<$F[1] and $_=$F[1] for $r{$F[0]}'

Upvotes: 2

P.P

Reputation: 121397

You can use awk's associate array and then sort based on column 2:

awk '{ if ($2>arr[$1]) arr[$1]=$2} END{for (i in arr) print i, arr[i]}' file \
| sort -k2 -rn

Upvotes: 2

Need to match a pattern occurrence only once in a file

Answers (5)

Related Questions