Reputation: 51
I have 3 columns
a 03 w
a 10 x
a 01 y
b 20 w
b 01 x
c 02 w
c 10 y
c 12 z
Expected output is
a 10 x
b 20 w
c 12 z
i.e. i need to sort column 2 but without changing the order of column 1 then grep the line with max value in the list based on 2nd column
Upvotes: 4
Views: 134
Reputation: 92854
Two approaches (choose one you like):
1) sort + uniq "trick":
sort -k1,1 -k2,2rn file | uniq -w1
-k1,1
- sort lines by the 1st field on 1st phase
-k2,2rn
- sort lines by the 2nd field numerically in reversed order
uniq -w1
- output unique lines comparing no more than 1
character in lines (can be adjustable -w<number>
)
The output:
a 10 x
b 20 w
c 12 z
2) Simply with GNU datamash tool:
datamash -Wsf -g1 max 2 <file | cut -f1-3
The output:
a 10 x
b 20 w
c 12 z
Upvotes: 4
Reputation: 133600
try following too once.
awk '
{
b[$1]=a[$1]>$2?(b[$1]?b[$1]:$0):$0;
a[$1]=a[$1]>$2?a[$1]:$2;
}
END{
for(i in a){
print b[i]
}
}
' Input_file
Explanation:
awk '
{ ##Starting block here.
b[$1]=a[$1]>$2?(b[$1]?b[$1]:$0):$0;##creating an array named b whose index is $1, then checking if array a with index $1 value is greater than $2 or not, if yes then assign b[$1] to b[$1] else change it to current line. This is to make sure always we should get the line whose $2 value is greater than its previous value with respect to $1.
a[$1]=a[$1]>$2?a[$1]:$2; ##creating an array named a whose index is $1 and checking if value of a[$1] is greater than $2 is yes then keep a[$1] value as it is else change its value to current line value.
}
END{ ##Starting END block of awk here.
for(i in a){ ##Starting a for loop to traverse inside array a elements.
print b[i] ##Because array a and array b have same indexes and we have to print whole lines values so printing array b value here.
}
}
' Input_file ##mentioning the Input_file here.
Upvotes: 1
Reputation: 158080
You can use the UNIX commands sort
and awk
:
sort -k1,1 -k2,2nr file | awk '!seen[$1]++'
To apply them to the buffer in vim:
:!%sort -k1,1 -k2,2nr | awk '\!seen[$1]++'
Explanation:
The sort command will sort in input in to levels, first on column 1 and then on column 2. That gives you the following intermediate output:
a 10 x
a 03 w
a 01 y
b 20 w
b 01 x
c 12 z
c 10 y
c 02 w
We pipe that to a little awk
script which maintains an array variable seen
which is indexed by column 1. Since the logic is reverted by !
, once we've seen column 1 before, we won't print it again:
a 10 x <-- print
a 03 w
a 01 y
b 20 w <-- print
b 01 x
c 12 z <-- print
c 10 y
c 02 w
Upvotes: 1
Reputation: 16997
Input
$ cat infile
a 03 w
a 10 x
a 01 y
b 20 w
b 01 x
c 02 w
c 10 y
c 12 z
Output
$ awk -F'[[:blank:]]' '{f=($1 in b)}f && b[$1]<$2 || !f{a[$1]=$0;b[$1]=$2}END{for(i in a)print a[i]}' infile
a 10 x
b 20 w
c 12 z
Better Readable
awk -F'[[:blank:]]' '
{
f=($1 in b)
}
f && b[$1]<$2 || !f{
a[$1]=$0;
b[$1]=$2
}
END{
for(i in a)
print a[i]
}
' infile
Explanation
-F'[[:blank:]]'
- Set Input Field Separator
f=($1 in b)
- variable f
holds boolean status (true=1/false=0
), depending on whether index/array key ($1
) exists in array b
f && b[$1]<$2 || !f
if f
is true and array(b[$1]
) value is less than (< $2
) current row/record/line's 2nd column value, or (||
) !f
meaning array does not have key which we looked for then
a[$1]=$0;
array (a
) with index key being first column($1
) of current line holds entire line/row/record ($0
)
b[$1]=$2
array (b
) with index key being first column($1
) of current line holds 2nd field value ($2
)
END { for(i in a) print a[i] }
END block loop through array a
and print array values.
Note : Please modify
-F'...'
accordingly, to match your file field separator
Upvotes: 1