Reputation: 11
I'm using sort to sort the lines of a file according to the alphabetical order but i get some weird results.
I thought it was using the decimal code of the characters to sort them, but it doesn't look like. i.e. 'E' (dec code 69) comes after 'e' (dec code 101), '0' (code 48) comes after ':' (code 58).
i tried to use
if [[ "E" < "e" ]]; then echo "true"; else echo "false"; fi
and
if [[ "0" < ":" ]]; then echo "true"; else echo "false"; fi
to check if it gives me the right answer, but it doesn't. i'm working on ubuntu 12.04 and the encoding of my environment and the files i'm trying to sort are defined to en_us.UTF-8.
The problem is that i have to parse those files in a java program assuming that the lines are sorted alphabetically. So, while java is comparing string according to the decimal codes of their characters, my parsing fails because the lines in the file are not sorted according to the same order.
Could someone help me to solve this either by forcing sort to use the decimal codes, either by using another way to sort the files according to this order.
many thanks for any help.
Upvotes: 1
Views: 461
Reputation: 161604
WARNING in the manual.
*** WARNING *** The locale specified by the environment affects sort
order. Set LC_ALL=C to get the traditional sort order that uses native
byte values.
Try this:
$ LC_ALL=C sort input.txt
0
:
E
e
($
is shell prompt)
Upvotes: 3