Reputation: 187
I have a file named a.csv. which contains
100008,3
10000,3
100010,5
100010,4
10001,6
100021,7
After running this command sort -k1 -d -t "," a.csv
The result is
10000,3
100008,3
100010,4
100010,5
10001,6
100021,7
Which is unexpected because 10001 should come first than 100010
Trying to understand why this happened from long time. but couldn't get any answers.
$ sort --version
sort (GNU coreutils) 8.13
Copyright (C) 2011 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Written by Mike Haertel and Paul Eggert.
Upvotes: 0
Views: 81
Reputation: 157
Some of the other responses have assumed this is a numeric sort vs dictionary sort problem. It isn't, as even sorting alphabetically the output given in the question is incorrect.
To get the correct sorting, you need to change -k1
to -k1,1
:
$ sort -k1,1 -d -t "," a.csv
10000,3
100008,3
10001,6
100010,4
100010,5
100021,7
The -k
option takes two numbers, the start and end fields to sort (i.e. -ks,e
where s
is the start and e
is the end). By default, the end field is the end of the line. Hence, -k1
is the same as not giving the -k
option at all. To show this, compare:
$ printf "1,a,1\n2,aa,2\n" | sort -k2 -t,
1,a,1
2,aa,2
with:
$ printf "1~a~1\n2~aa~2\n" | sort -k2 -t~
2~aa~2
1~a~1
The first sorts a,1
before aa,2
, while the second sorts aa~2
before a~1
since, in ASCII, ,
< a
< ~
.
To get the desired behaviour, therefore, we need to sort only one field. In your case, that means using 1 as both the start and end field, so you specify -k1,1
. If you try the two examples above with -k2,2
instead of -k2
, you'll find you get the same (correct) ordering in both cases.
Many thanks to Eric and Assaf from the coreutils mailing list for pointing this out.
Upvotes: 2
Reputation: 59
You have not found a bug in sort. Your usage bug is that you used '-k1' ("set the key to the first field through the end of the line") instead of '-k1,1' ("set the key to use only the first field"). If you use GNU sort, the --debug option will show you the difference. The delimiter is included in the key as long as the key extends beyond a single field.
Upvotes: 2
Reputation: 995
The sort is alphabetical, not numerical. Replace -d by -n in your option list to sort numerically.
Upvotes: 0
Reputation: 25383
The -d
option is for --dictionary-order
:
-d, --dictionary-order consider only blanks and alphanumeric characters
But I think you want to use -n
(--numeric-sort
) instead:
-n, --numeric-sort compare according to string numerical value
So, change your command to look like this:
sort -k1 -n -t "," a.csv
http://man7.org/linux/man-pages/man1/sort.1.html
Upvotes: 0
Reputation: 2185
It sorts alphabetically, not numerically, so "," is before "0", i.e. more like a dictionary
Upvotes: 0