DKSRathore
DKSRathore

Reputation: 3063

Why linux sort is not giving me desired results?

I have a file a.csv with contents similar to below

a,b,c
a  ,aa,  a
a b, c, f
a , b, c
a b a b a,a,a
a,a,a
a aa ,a , t

I am trying to sort it by using sort -k1 -t, a.csv But it is giving following results

a,a,a
a  ,aa,  a
a aa ,a , t
a b a b a,a,a
a , b, c
a,b,c
a b, c, f

Which is not the actual sort on 1st column. What am I doing wrong?

Upvotes: 0

Views: 2163

Answers (3)

badp
badp

Reputation: 11813

Try this instead:

sort -k 1,1 -t , a.csv

sort reads -k 1 as "sort from first field onwards" -- thus effectively defying the point of passing the argument in the first place.

This is documented in the sort man page and warned about in the Examples section:

Sort numerically on the second field and resolve ties by sorting alphabetically on the third and fourth characters of field five. Use `:' as the field delimiter:

$ sort -t : -k 2,2n -k 5.3,5.4

Note that if you had written -k 2 instead of -k 2,2, sort would have used all characters beginning in the second field and extending to the end of the line as the primary numeric key. For the large majority of applications, treating keys spanning more than one field as numeric will not do what you expect.

Upvotes: 2

Yannick Motton
Yannick Motton

Reputation: 35971

Give this a try: sort -t, -k1,1 a.csv

The man suggests that omitting the end field, it will sort on all characters starting at field n until the end of the line:

-k POS1[,POS2]'
     The recommended, POSIX, option for specifying a sort field.  The
     field consists of the part of the line between POS1 and POS2 (or
     the end of the line, if POS2 is omitted), _inclusive_.  Fields and
     character positions are numbered starting with 1.  So to sort on
     the second field, you'd use `-k 2,2' See below for more examples.

Upvotes: 2

Eemeli Kantola
Eemeli Kantola

Reputation: 5557

You have to specify the end position to be 1, too:

sort -k1,1 -t, a.csv

Upvotes: 2

Related Questions