Karel Bílek
Karel Bílek

Reputation: 37658

sort not sorting as expected (space and locale)

I want to sort a text file through linux sort, that looks like this

v 1006
v10 1
v 1011

I would expect result like this:

v 1006
v 1011
v10 1

However, using sort, even with all kinds of options, the v10 1 line is still in the middle. Why? I would understand v10 1 being either on top on on the bottom (depending if space character is smaller or bigger than 1), but for what reason it is kept in the middle?

Upvotes: 18

Views: 9941

Answers (2)

Tatu Lahtela
Tatu Lahtela

Reputation: 4554

It uses the system locale to determine the sorting order of letters. My guess is that with your locale, it ignores whitespace.

$ cat foo.txt 
v 1006
v10 1
v 1011
$ LC_ALL=C sort foo.txt
v 1006
v 1011
v10 1
$ LC_ALL=en_US.utf8 sort foo.txt
v 1006
v10 1
v 1011

Upvotes: 22

viraptor
viraptor

Reputation: 34145

Your locale influences how the lines are sorted. For example I get this with my current locale:

% echo -e "v 1006\nv10 1\nv 1011" | sort
v 1006
v10 1
v 1011

But with C locale I get this:

% echo -e "v 1006\nv10 1\nv 1011" | LC_ALL=C sort
v 1006
v 1011
v10 1

I'm not sure why it behaves that way really. LC_ALL=C is pretty much equivalent to turning off all unexpected processing and going back to the byte-level operations (yeah - I'm skipping the details).

Why do different locale settings skip space is harder to explain though. If anyone can explain that would be good :)

Upvotes: 4

Related Questions