Reputation: 5169
I have the following text file:
$ cat myfile.txt
foo,a,10
bar,c,33
foo,b,50
bar,a,9
foo,a,20
bar,b,20
What I want to do is to sort by 2nd column and 3rd (numeric descending) and finally pick the top of each 2nd column group, resulting in
foo,a,20
foo,b,50
bar,c,33
I'm stuck with this:
$ sort -t"," -k2 -k3r test.txt
foo,a,10
foo,a,20
bar,a,9
bar,b,20
foo,b,50
bar,c,33
Whats the right way to do it?
Upvotes: 2
Views: 340
Reputation: 8467
Your sort
command is close. -k
actually takes a range of fields, so your -k2
is being interpreted as "sort on everything from the second field to the end of the line," and it's ignoring the -k3
entirely. You have to be explicit about where the ranges start and end. You also want to sort on the third numerically.
Here's all that put together:
$ sort -t, -k2,2 -k3,3nr myfile.txt
foo,a,20
foo,a,10
bar,a,9
foo,b,50
bar,b,20
bar,c,33
Now you want to pick the first line of each group, based on the second field. Although sort
has the ability to dedup a stream, it's not smart enough to handle complex scenarios like this. Fortunately, we have awk:
$ sort -t, -k2,2 -k3,3nr myfile.txt | awk -F, 'x != $2 { print; x = $2 }'
foo,a,20
foo,b,50
bar,c,33
Upvotes: 6
Reputation: 694
awk smauk
PURE BASH!!
regex=',(.+),'
var="xx"
for line in $(sort -t, -k2,2 -k3,3nr myfile.txt); do
if [[ $line =~ $regex ]]; then
bashrematch=${BASH_REMATCH[1]}
if [[ "$var" != "$bashrematch" ]]; then
var=$bashrematch
echo $line
fi
fi
done
foo,a,20
foo,b,50
bar,c,33
just to compress and shrink the code a bit
r=',(.+),'; v=""; for l in $(sort -t, -k2,2 -k3,3nr myfile.txt); do [[ $l =~ $r ]] && b=${BASH_REMATCH[1]} && [ "$v" != "$b" ] && v=$b && echo $l; done
foo,a,20
foo,b,50
bar,c,33
Upvotes: -1
Reputation: 4043
Here's pure awk
method to do that,
awk -F, '{split(a[$2],b,",");if(b[3]<$3)a[$2]=$0}END{for(i in a)print [i]}' myfile.txt
Brief explanation,
split(a[$2],b,",")
: split each record separated by ',', and save each field to array b.if(b[3]<$3)a[$2]=$0
: compare b[3] and $3 to determine if a[$2] needs to be updatedUpvotes: 0