user1594909
user1594909

Reputation: 11

inconsistent answer with sort command

I'm using sort to sort the lines of a file according to the alphabetical order but i get some weird results. I thought it was using the decimal code of the characters to sort them, but it doesn't look like. i.e. 'E' (dec code 69) comes after 'e' (dec code 101), '0' (code 48) comes after ':' (code 58).
i tried to use

if [[ "E" < "e" ]]; then echo "true"; else echo "false"; fi

and

if [[ "0" < ":" ]]; then echo "true"; else echo "false"; fi

to check if it gives me the right answer, but it doesn't. i'm working on ubuntu 12.04 and the encoding of my environment and the files i'm trying to sort are defined to en_us.UTF-8.

The problem is that i have to parse those files in a java program assuming that the lines are sorted alphabetically. So, while java is comparing string according to the decimal codes of their characters, my parsing fails because the lines in the file are not sorted according to the same order.

Could someone help me to solve this either by forcing sort to use the decimal codes, either by using another way to sort the files according to this order.

many thanks for any help.

Upvotes: 1

Views: 461

Answers (1)

kev
kev

Reputation: 161604

WARNING in the manual.

   *** WARNING *** The locale specified by the  environment  affects  sort
   order.  Set LC_ALL=C to get the traditional sort order that uses native
   byte values.

Try this:

$ LC_ALL=C sort input.txt 
0
:
E
e

($ is shell prompt)

Upvotes: 3

Related Questions