Jon Rea
Jon Rea

Reputation: 9455

Weird string sorting when 2nd string is longer

Comparing "î"

string.Compare("î", "I ", StringComparison.CurrentCulture) -- returns -1
string.Compare("î", "I ", StringComparison.CurrentCultureIgnoreCase) -- returns -1
string.Compare("î", "I", StringComparison.CurrentCulture) -- returns 1 (unexpected)
string.Compare("î", "I", StringComparison.CurrentCultureIgnoreCase) -- returns 1  (unexpected)

With "i"

string.Compare("i", "I ", StringComparison.CurrentCulture) -- returns -1
string.Compare("i", "I ", StringComparison.CurrentCultureIgnoreCase) -- returns -1
string.Compare("i", "I", StringComparison.CurrentCulture) -- returns -1
string.Compare("i", "I", StringComparison.CurrentCultureIgnoreCase) -- returns 0

Current culture was en-GB. I would expect all of these to return 1. Why does having a longer string change the sort order?

Upvotes: 5

Views: 183

Answers (4)

Matthew Watson
Matthew Watson

Reputation: 109567

See the UTS#10: Unicode Collation Algorithm for the full details.

In particular, see section 1.1 Multi-Level Comparison which explains this behaviour.

There's a table there showing some examples, such as this one:

role < rôle < roles

That is analogous to your example with "I" , "î" and "I ", i.e.:

"I" < "î" < "I "

except where roles has an s at the end, your example has a space at the end. But the same logic applies; it's irrelevant what the extra character is - the simple fact that there is an extra character makes it sort AFTER the "î".

A crucial point from the spec is:

Accent differences are typically ignored, if the base letters differ.

The base letters differ if the lengths differ, so the accent differences are ignored in your examples with the space at the end.

However, where the strings are the same length, the accent differences are not being ignored - which is exactly the results you are seeing.

Upvotes: 9

Pedro Lamas
Pedro Lamas

Reputation: 7233

The behavior is weird, I'll give you that, but I don't see why not use Ordinal comparisons given the international context implied here.

For more info, please read this article.

Upvotes: -1

bastos.sergio
bastos.sergio

Reputation: 6764

From the Documentation

The comparison terminates when an inequality is discovered or both strings have been compared. However, if the two strings compare equal to the end of one string, and the other string has characters remaining, then the string with remaining characters is considered greater. The return value is the result of the last comparison performed.

Upvotes: 2

Ricardo Rodrigues
Ricardo Rodrigues

Reputation: 2258

Basically because when sorting strings length matters

"a" is smaller than "a " right? makes sense.

Upvotes: 0

Related Questions