Brian Li
Brian Li

Reputation: 45

What is the default order for sort?

Given(a.txt):

2n 
4t 
7t 
11t 

After:

sort a.txt

OutPut:

11t 
2n 
4t 
7t 

Question:

why is this order? what is the sort based on? (number or other?)

And when i try to give this input:

2
4
7
11
20
30

Output give me this order:

11
2
20
30
4
7

So confused, why is 11 always the 1st?

Upvotes: 0

Views: 838

Answers (2)

Kartone
Kartone

Reputation: 21

GNU sort manual says:

all comparisons use the character collating sequence specified by the LC_COLLATE locale.

you can type locale | grep LC_COLLATE to check your LC_COLLATE. If output is en_US.UTF-8, sort usually sorts different lines based on the value of ascii by default.

For the a.txt you mention above, sort will compare the first character of lines. While the ascii value of 1 is 49, 2 is 50 and so on, it turns out that 11t is before 2n because the ascii value of 1 is smaller than that of 2.

If you want to compare numerically only, try -n. If compared alphanumerically, try -d. You can learn those options deeply in man sort.

Upvotes: 0

codeforester
codeforester

Reputation: 43039

From man sort:

The sort utility sorts text and binary files by lines. A line is a record separated from the subsequent record by a newline (default) or NUL '\0' character (-z option). A record can contain any printable or unprintable characters. Comparisons are based on one or more sort keys extracted from each line of input, and are performed lexicographically, according to the current locale's collating rules and the specified command-line options that can tune the actual sorting behavior. By default, if keys are not given, sort uses entire lines for comparison.

sort is using alphabetical (lexicographic) order by default. If you want your file to be sorted numerically, use sort -n.

Regarding your specific question about why 11 is coming before 2 in the sorted output:

  • lexicographically, any string starting with 1 will always be less than any string that starts with 2
  • sort is not using numeric order by default

You can see the ASCII values of 1 and 2:

printf '%d\n' "'1" "'2"
49
50

Upvotes: 2

Related Questions