Reputation: 546
a Linux question: I have the CSV file data.csv
with the following fields and values
KEY,LEVEL,DATA
2.456,2,aaa
2.456,1,zzz
0.867,2,bbb
9.775,4,ddd
0.867,1,ccc
2.456,0,ttt
...
The field KEY
is a float value, while LEVEL
is an integer. I know that the first field can have repeated values, as well as the second one, but if you take them together you have a unique couple.
What I would like to do is to sort the file according to the column KEY
and then for each unique value under KEY
keep only the row having the higher value under LEVEL
.
Sorting is not a problem:
$> sort -t, -k1,2 data.csv # fields: KEY,LEVEL,DATA
0.867,1,ccc
0.867,2,bbb
2.456,0,ttt
2.456,1,zzz
2.456,2,aaa
9.775,4,ddd
...
but then how can I filter the rows so that I get what I want, which is:
0.867,2,bbb
2.456,2,aaa
9.775,4,ddd
...
Is there a way to do it using command line tools like sort
, uniq
, awk
and so on? Thanks in advance
Upvotes: 1
Views: 3030
Reputation: 241728
Perl solution:
perl -aF, -ne '$h{$F[0]} = [@F[1,2]] if $F[1] > $h{$F[0]}[0]
}{
print join ",", $_, @{$h{$_}} for sort {$a<=>$b} keys %h' data.csv
Note that the result is different from the one you requested, the first line contains bbb
, not ccc
.
Upvotes: 0
Reputation: 195029
try this line:
your sort...|awk -F, 'k&&k!=$1{print p}{p=$0;k=$1}END{print p}'
output:
kent$ echo "0.867,1,bbb
0.867,2,ccc
2.456,0,ttt
2.456,1,zzz
2.456,2,aaa
9.775,4,ddd"|awk -F, 'k&&k!=$1{print p}{p=$0;k=$1}END{print p}'
0.867,2,ccc
2.456,2,aaa
9.775,4,ddd
The idea is, because your file is already sorted, just go through the file/input from top, if the first column (KEY
) changed, print the last line, which is the highest value of LEVEL
of last KEY
try with your real data, it should work.
also the whole logic (with your sort) could be done by awk in single process.
Upvotes: 4
Reputation: 1385
Use:
$> sort -r data.csv | uniq -w 5 | sort
given your floats are formatted "0.000"-"9.999"
Upvotes: 3