Reputation: 129

how to sort with the third column

I know there have been some questions about it. I tried the methods they mentioned but it does not work.

My data is in Book1.csv file like this:

Then I used bash code: sort -r -n -k 3,3 Book1.csv > sorted.csv

But the outcome is not what I want: I want the outcome to be like:

In addition, since the first colume is Id, the third column is score, I want to print the ids with the highest scores. In this case, it should print the two id whose score are 50, like this:TRAAAAY128F42A73F0 TRAAAAV128F421A322 How to achieve it?

Upvotes: 0

Answers (2)

Daniel Martin

Reputation: 23548

While the printing all IDs with the highest score can be done in bash with basic unix commands, I think it's better to, at this point, switch to an actual scripting language. (unless you're in some very limited environment)

Fortunately, perl is everywhere, and this task of printing the ids with the largest scores can be done as one (long) line in perl:

perl -lne 'if (/^([^,]*),[^,]*,\s*([^,]*)/) {push @{$a{$2}},$1; if($2>$m) {$m=$2;}} END {print "@{$a{$m}}";}' Book1.csv

Upvotes: 0

jkdba

Reputation: 2509

Assuming that your csv is comma separated and not another delimiter this is one way to do it. However, I think there is probably away to do most of this if not all in awk, unfortunately my knowledge is limited with awk so here is how I would do it quickly.

First according to the comments the -t flag of sort resolved your sorting issue.

#!/bin/bash
#set csv file to variable
mycsv="/path/csv.csv"

#get the third value of the first line after sorting on the third value descending.
max_val=$(sort -t, -k3,3nr $mycsv | head -n1 | cut -f3)
#use awk to evaluate the thrid column is equal to the maxvalue then print the first column. 
#Note I am setting the delimiter to a comma here with the -F flag
awk -F"," -v awkmax="$maxval" '$3 == awkmax {print $1}' $mycsv

Upvotes: 1

how to sort with the third column

Answers (2)

Related Questions