Vinay V
Vinay V

Reputation: 51

How to sort the columns with below requirement

I have 3 columns

a 03 w
a 10 x
a 01 y
b 20 w
b 01 x
c 02 w
c 10 y
c 12 z

Expected output is

a 10 x
b 20 w
c 12 z

i.e. i need to sort column 2 but without changing the order of column 1 then grep the line with max value in the list based on 2nd column

Upvotes: 4

Views: 134

Answers (4)

RomanPerekhrest
RomanPerekhrest

Reputation: 92854

Two approaches (choose one you like):

1) sort + uniq "trick":

sort -k1,1 -k2,2rn file | uniq -w1
  • -k1,1 - sort lines by the 1st field on 1st phase

  • -k2,2rn - sort lines by the 2nd field numerically in reversed order

  • uniq -w1 - output unique lines comparing no more than 1 character in lines (can be adjustable -w<number>)

The output:

a 10 x
b 20 w
c 12 z

2) Simply with GNU datamash tool:

datamash -Wsf -g1 max 2 <file | cut -f1-3

The output:

a   10  x
b   20  w
c   12  z

Upvotes: 4

RavinderSingh13
RavinderSingh13

Reputation: 133600

try following too once.

awk '
{
  b[$1]=a[$1]>$2?(b[$1]?b[$1]:$0):$0;
  a[$1]=a[$1]>$2?a[$1]:$2;
}
END{
  for(i in a){
     print b[i]
}
}
'   Input_file

Explanation:

awk '
{                                    ##Starting block here.
  b[$1]=a[$1]>$2?(b[$1]?b[$1]:$0):$0;##creating an array named b whose index is $1, then checking if array a with index $1 value is greater than $2 or not, if yes then assign b[$1] to b[$1] else change it to current line. This is to make sure always we should get the line whose $2 value is greater than its previous value with respect to $1.
  a[$1]=a[$1]>$2?a[$1]:$2; ##creating an array named a whose index is $1 and checking if value of a[$1] is greater than $2 is yes then keep a[$1] value as it is else change its value to current line value.
}
END{                       ##Starting END block of awk here.
  for(i in a){             ##Starting a for loop to traverse inside array a elements.
     print b[i]            ##Because array a and array b have same indexes and we have to print whole lines values so printing array b value here.
}
}
'  Input_file              ##mentioning the Input_file here.

Upvotes: 1

hek2mgl
hek2mgl

Reputation: 158080

You can use the UNIX commands sort and awk:

sort -k1,1 -k2,2nr file | awk '!seen[$1]++'

To apply them to the buffer in vim:

:!%sort -k1,1 -k2,2nr | awk '\!seen[$1]++'

Explanation:

The sort command will sort in input in to levels, first on column 1 and then on column 2. That gives you the following intermediate output:

a 10 x
a 03 w
a 01 y
b 20 w
b 01 x
c 12 z
c 10 y
c 02 w

We pipe that to a little awk script which maintains an array variable seen which is indexed by column 1. Since the logic is reverted by !, once we've seen column 1 before, we won't print it again:

a 10 x  <-- print
a 03 w
a 01 y
b 20 w  <-- print
b 01 x
c 12 z  <-- print
c 10 y
c 02 w

Upvotes: 1

Akshay Hegde
Akshay Hegde

Reputation: 16997

Input

$ cat infile
a 03 w
a 10 x
a 01 y
b 20 w
b 01 x
c 02 w
c 10 y
c 12 z

Output

$ awk -F'[[:blank:]]' '{f=($1 in b)}f && b[$1]<$2 || !f{a[$1]=$0;b[$1]=$2}END{for(i in a)print a[i]}' infile
a 10 x
b 20 w
c 12 z

Better Readable

awk -F'[[:blank:]]' '
                     {
                       f=($1 in b)
                     }
                     f && b[$1]<$2 || !f{
                        a[$1]=$0;
                        b[$1]=$2
                     }
                  END{
                        for(i in a)
                            print a[i]
                     }
                    ' infile

Explanation

  • -F'[[:blank:]]' - Set Input Field Separator

  • f=($1 in b) - variable f holds boolean status (true=1/false=0), depending on whether index/array key ($1) exists in array b

  • f && b[$1]<$2 || !f if f is true and array(b[$1]) value is less than (< $2) current row/record/line's 2nd column value, or (||) !f meaning array does not have key which we looked for then

  • a[$1]=$0; array (a) with index key being first column($1) of current line holds entire line/row/record ($0)

  • b[$1]=$2 array (b) with index key being first column($1) of current line holds 2nd field value ($2)

  • END { for(i in a) print a[i] } END block loop through array a and print array values.

Note : Please modify -F'...' accordingly, to match your file field separator

Upvotes: 1

Related Questions