Reputation: 129
I know there have been some questions about it. I tried the methods they mentioned but it does not work.
My data is in Book1.csv file like this:
Then I used bash code: sort -r -n -k 3,3 Book1.csv > sorted.csv
But the outcome is not what I want: I want the outcome to be like:
In addition, since the first colume is Id, the third column is score, I want to print the ids with the highest scores. In this case, it should print the two id whose score are 50, like this:TRAAAAY128F42A73F0 TRAAAAV128F421A322
How to achieve it?
Upvotes: 0
Views: 163
Reputation: 23548
While the printing all IDs with the highest score can be done in bash with basic unix commands, I think it's better to, at this point, switch to an actual scripting language. (unless you're in some very limited environment)
Fortunately, perl
is everywhere, and this task of printing the ids with the largest scores can be done as one (long) line in perl
:
perl -lne 'if (/^([^,]*),[^,]*,\s*([^,]*)/) {push @{$a{$2}},$1; if($2>$m) {$m=$2;}} END {print "@{$a{$m}}";}' Book1.csv
Upvotes: 0
Reputation: 2509
Assuming that your csv is comma separated and not another delimiter this is one way to do it. However, I think there is probably away to do most of this if not all in awk, unfortunately my knowledge is limited with awk so here is how I would do it quickly.
First according to the comments the -t
flag of sort
resolved your sorting issue.
#!/bin/bash
#set csv file to variable
mycsv="/path/csv.csv"
#get the third value of the first line after sorting on the third value descending.
max_val=$(sort -t, -k3,3nr $mycsv | head -n1 | cut -f3)
#use awk to evaluate the thrid column is equal to the maxvalue then print the first column.
#Note I am setting the delimiter to a comma here with the -F flag
awk -F"," -v awkmax="$maxval" '$3 == awkmax {print $1}' $mycsv
Upvotes: 1