Reputation: 9455
Comparing "î"
string.Compare("î", "I ", StringComparison.CurrentCulture) -- returns -1
string.Compare("î", "I ", StringComparison.CurrentCultureIgnoreCase) -- returns -1
string.Compare("î", "I", StringComparison.CurrentCulture) -- returns 1 (unexpected)
string.Compare("î", "I", StringComparison.CurrentCultureIgnoreCase) -- returns 1 (unexpected)
With "i"
string.Compare("i", "I ", StringComparison.CurrentCulture) -- returns -1
string.Compare("i", "I ", StringComparison.CurrentCultureIgnoreCase) -- returns -1
string.Compare("i", "I", StringComparison.CurrentCulture) -- returns -1
string.Compare("i", "I", StringComparison.CurrentCultureIgnoreCase) -- returns 0
Current culture was en-GB. I would expect all of these to return 1. Why does having a longer string change the sort order?
Upvotes: 5
Views: 183
Reputation: 109567
See the UTS#10: Unicode Collation Algorithm for the full details.
In particular, see section 1.1 Multi-Level Comparison which explains this behaviour.
There's a table there showing some examples, such as this one:
role < rôle < roles
That is analogous to your example with "I" , "î" and "I ", i.e.:
"I" < "î" < "I "
except where roles
has an s
at the end, your example has a space at the end. But the same logic applies; it's irrelevant what the extra character is - the simple fact that there is an extra character makes it sort AFTER the "î".
A crucial point from the spec is:
Accent differences are typically ignored, if the base letters differ.
The base letters differ if the lengths differ, so the accent differences are ignored in your examples with the space at the end.
However, where the strings are the same length, the accent differences are not being ignored - which is exactly the results you are seeing.
Upvotes: 9
Reputation: 7233
The behavior is weird, I'll give you that, but I don't see why not use Ordinal
comparisons given the international context implied here.
For more info, please read this article.
Upvotes: -1
Reputation: 6764
From the Documentation
The comparison terminates when an inequality is discovered or both strings have been compared. However, if the two strings compare equal to the end of one string, and the other string has characters remaining, then the string with remaining characters is considered greater. The return value is the result of the last comparison performed.
Upvotes: 2
Reputation: 2258
Basically because when sorting strings length matters
"a" is smaller than "a " right? makes sense.
Upvotes: 0