Reputation: 45
Given(a.txt):
2n
4t
7t
11t
After:
sort a.txt
OutPut:
11t
2n
4t
7t
Question:
why is this order? what is the sort based on? (number or other?)
And when i try to give this input:
2
4
7
11
20
30
Output give me this order:
11
2
20
30
4
7
So confused, why is 11 always the 1st?
Upvotes: 0
Views: 838
Reputation: 21
GNU sort manual says:
all comparisons use the character collating sequence specified by the LC_COLLATE locale.
you can type locale | grep LC_COLLATE
to check your LC_COLLATE
. If output is en_US.UTF-8
, sort
usually sorts different lines based on the value of ascii by default.
For the a.txt
you mention above, sort
will compare the first character of lines. While the ascii value of 1
is 49, 2
is 50 and so on, it turns out that 11t is before 2n because the ascii value of 1
is smaller than that of 2
.
If you want to compare numerically only, try -n
. If compared alphanumerically, try -d
. You can learn those options deeply in man sort
.
Upvotes: 0
Reputation: 43039
From man sort
:
The sort utility sorts text and binary files by lines. A line is a record separated from the subsequent record by a newline (default) or NUL '\0' character (-z option). A record can contain any printable or unprintable characters. Comparisons are based on one or more sort keys extracted from each line of input, and are performed lexicographically, according to the current locale's collating rules and the specified command-line options that can tune the actual sorting behavior. By default, if keys are not given, sort uses entire lines for comparison.
sort
is using alphabetical (lexicographic) order by default. If you want your file to be sorted numerically, use sort -n
.
Regarding your specific question about why 11 is coming before 2 in the sorted output:
You can see the ASCII values of 1 and 2:
printf '%d\n' "'1" "'2"
49
50
Upvotes: 2