littleworth
littleworth

Reputation: 5169

How to pick top value from tie-break group using Unix sort

I have the following text file:

$ cat myfile.txt
foo,a,10
bar,c,33
foo,b,50
bar,a,9
foo,a,20
bar,b,20

What I want to do is to sort by 2nd column and 3rd (numeric descending) and finally pick the top of each 2nd column group, resulting in

foo,a,20
foo,b,50
bar,c,33

I'm stuck with this:

$ sort -t"," -k2 -k3r   test.txt
foo,a,10
foo,a,20
bar,a,9
bar,b,20
foo,b,50
bar,c,33

Whats the right way to do it?

Upvotes: 2

Views: 340

Answers (3)

mwp
mwp

Reputation: 8467

Your sort command is close. -k actually takes a range of fields, so your -k2 is being interpreted as "sort on everything from the second field to the end of the line," and it's ignoring the -k3 entirely. You have to be explicit about where the ranges start and end. You also want to sort on the third numerically.

Here's all that put together:

$ sort -t, -k2,2 -k3,3nr myfile.txt
foo,a,20
foo,a,10
bar,a,9
foo,b,50
bar,b,20
bar,c,33

Now you want to pick the first line of each group, based on the second field. Although sort has the ability to dedup a stream, it's not smart enough to handle complex scenarios like this. Fortunately, we have awk:

$ sort -t, -k2,2 -k3,3nr myfile.txt | awk -F, 'x != $2 { print; x = $2 }'
foo,a,20
foo,b,50
bar,c,33

Upvotes: 6

Calvin Taylor
Calvin Taylor

Reputation: 694

awk smauk

PURE BASH!!

regex=',(.+),'
var="xx"
for line in $(sort -t, -k2,2 -k3,3nr myfile.txt); do
  if [[ $line =~ $regex ]]; then
    bashrematch=${BASH_REMATCH[1]}
    if [[ "$var" != "$bashrematch" ]]; then
      var=$bashrematch
      echo $line
    fi
  fi
done
foo,a,20
foo,b,50
bar,c,33

just to compress and shrink the code a bit

r=',(.+),'; v=""; for l in $(sort -t, -k2,2 -k3,3nr myfile.txt); do [[ $l =~ $r ]] && b=${BASH_REMATCH[1]} && [ "$v" != "$b" ] && v=$b && echo $l; done
foo,a,20
foo,b,50
bar,c,33

Upvotes: -1

CWLiu
CWLiu

Reputation: 4043

Here's pure awk method to do that,

 awk -F, '{split(a[$2],b,",");if(b[3]<$3)a[$2]=$0}END{for(i in a)print [i]}' myfile.txt

Brief explanation,

  • split(a[$2],b,","): split each record separated by ',', and save each field to array b.
  • if(b[3]<$3)a[$2]=$0: compare b[3] and $3 to determine if a[$2] needs to be updated
  • Print all content in array a in the last.

Upvotes: 0

Related Questions