Reputation: 37658
I want to sort a text file through linux sort
, that looks like this
v 1006
v10 1
v 1011
I would expect result like this:
v 1006
v 1011
v10 1
However, using sort
, even with all kinds of options, the v10 1
line is still in the middle. Why? I would understand v10 1
being either on top on on the bottom (depending if space character is smaller or bigger than 1
), but for what reason it is kept in the middle?
Upvotes: 18
Views: 9941
Reputation: 4554
It uses the system locale to determine the sorting order of letters. My guess is that with your locale, it ignores whitespace.
$ cat foo.txt
v 1006
v10 1
v 1011
$ LC_ALL=C sort foo.txt
v 1006
v 1011
v10 1
$ LC_ALL=en_US.utf8 sort foo.txt
v 1006
v10 1
v 1011
Upvotes: 22
Reputation: 34145
Your locale influences how the lines are sorted. For example I get this with my current locale:
% echo -e "v 1006\nv10 1\nv 1011" | sort
v 1006
v10 1
v 1011
But with C locale I get this:
% echo -e "v 1006\nv10 1\nv 1011" | LC_ALL=C sort
v 1006
v 1011
v10 1
I'm not sure why it behaves that way really. LC_ALL=C
is pretty much equivalent to turning off all unexpected processing and going back to the byte-level operations (yeah - I'm skipping the details).
Why do different locale settings skip space is harder to explain though. If anyone can explain that would be good :)
Upvotes: 4