TDierckx
TDierckx

Reputation: 73

Unexpected bash sort behavior

If I create a text file containing the following lines:

>TESTTEXT_10000000
>TESTTEXT_1000000
>TESTTEXT_10000002
>TESTTEXT_10000001

and perform sort myfile, my output is

>TESTTEXT_1000000
>TESTTEXT_10000000
>TESTTEXT_10000001
>TESTTEXT_10000002

However, if I append /1 and /2 to my lines the sort output changes drastically, and I do not know why.

Input:

>TESTTEXT_10000000/1
>TESTTEXT_1000000/1
>TESTTEXT_10000002/1
>TESTTEXT_10000001/1

Output:

>TESTTEXT_10000000/1
>TESTTEXT_1000000/1
>TESTTEXT_10000001/1
>TESTTEXT_10000002/1

Input:

>TESTTEXT_10000000/2
>TESTTEXT_1000000/2
>TESTTEXT_10000002/2
>TESTTEXT_10000001/2

Output:

>TESTTEXT_10000000/2
>TESTTEXT_10000001/2
>TESTTEXT_1000000/2
>TESTTEXT_10000002/2

Is the forward slash being recognised as a seperator? using --field-sperator did not alter the behaviour. If so, why is 1000000/2 in between the 1000001/2 and 1000002/2 entries? Using the human sort, numeric sort or other options never brought about consistency. Can anyone help me out here?

:edit: Because it seems to be relevant, considering the answers, the value of LC_ALL on this machine is en_GB.UTF-8

Upvotes: 5

Views: 222

Answers (1)

Andreas Louv
Andreas Louv

Reputation: 47099

/ is before 0 in your locale. Using LC_ALL=C or other locale will properly not change anything.

In your use case you would properly be able to use -Version sort:

sort -V myfile

Alternative can you specify the separator and keys to sort on:

sort -t/ -k1,1 myfile

Upvotes: 3

Related Questions