yejinxin
yejinxin

Reputation: 596

sorting with multiple keys with Linux sort command

Say I have this file.

$ cat a.txt
c 1002 4
f 1001 1
d 1003 1
a 1001 3
e 1004 2
b 1001 2

I want to sort it by the second column and then by the third column. Column two are numbers, while column 3 can be treated as string. I know the following command works well.

$ sort -k2,2n -k3,3 a.txt
f 1001 1
b 1001 2
a 1001 3
c 1002 4
d 1003 1
e 1004 2

However, I think sort -k2n a.txt should also work, while it does not.

$ sort -k2n a.txt
a 1001 3
b 1001 2
f 1001 1
c 1002 4
d 1003 1
e 1004 2

Seems like it sorts by column two, and then by column one instead of column three. Why is this happening? Is it a bug or not? Cause sort -k2 a.txt works ok with above data since those numbers are just fixed width.

My sort version is sort (GNU coreutils) 8.15 in cygwin.

Upvotes: 7

Views: 18978

Answers (1)

I find this caution in the GNU sort docs.

Sort numerically on the second field and resolve ties by sorting alphabetically on the third and fourth characters of field five. Use ‘:’ as the field delimiter.

      sort -t : -k 2,2n -k 5.3,5.4

Note that if you had written -k 2n instead of -k 2,2n sort would have used all characters beginning in the second field and extending to the end of the line as the primary numeric key. For the large majority of applications, treating keys spanning more than one field as numeric will not do what you expect.

I'm not sure what it ends up with when it evaluates '1001 3' as a numeric key, but "will not do what you expect" is accurate. It seems clear that the Right Thing to do is to specify each key independently.

The same web page says this about resolving "ties".

Finally, as a last resort when all keys compare equal, sort compares entire lines as if no ordering options other than --reverse (-r) were specified.

I'll confess I'm a little mystified about how to interpret that.

Upvotes: 10

Related Questions