Reputation: 77
I have a problem in comparing strings.I want to compare two "éd" and "ef" french texts like this
Collator localeSpecificCollator = Collator.getInstance(Locale.FRANCE);
CollationKey a = localeSpecificCollator.getCollationKey("éd");
CollationKey b = localeSpecificCollator.getCollationKey("ef");
System.out.println(a.compareTo(b));
This will print -1
, but in french alphabet e
come before é
. But when we compare only e
and é
like this
Collator localeSpecificCollator = Collator.getInstance(Locale.FRANCE);
CollationKey a = localeSpecificCollator.getCollationKey("é");
CollationKey b = localeSpecificCollator.getCollationKey("e");
System.out.println(a.compareTo(b));
result is 1
. Can you tell we what is wrong in first part of code?
Upvotes: 7
Views: 762
Reputation: 328737
This seems to be the expected behaviour and it also seems to be the correct way to sort alphabetically in French.
The Android javadoc gives a hint as to why it is behaving like that - I suppose the details of the implementation in android are similar, if not identical, to the the standard JDK:
A tertiary difference is ignored when there is a primary or secondary difference anywhere in the strings.
In other words, because your 2 strings are sortable by only looking at primary differences (excluding the accents) the collator does not check the other differences.
It seems to be compliant with the Unicode Collation Algorithm (UCA):
Accent differences are typically ignored, if the base letters differ.
And it also seems to be the correct way to sort alphabetically in French, according to the wikipedia article on "ordre alphabetique":
En première analyse, les caractères accentués, de même que les majuscules, ont le même rang alphabétique que le caractère fondamental
Si plusieurs mots ont le même rang alphabétique, on tâche de les distinguer entre eux grâce aux majuscules et aux accents (pour le e, on a l'ordre e, é, è, ê, ë)
In English: the order initially ignores accents and case - if 2 words can't be sorted that way, accents and case are then taken into account.
Upvotes: 5
Reputation:
From the JavaDoc:
You can set a Collator's strength property to determine the level of difference considered significant in comparisons. Four strengths are provided: PRIMARY, SECONDARY, TERTIARY, and IDENTICAL. The exact assignment of strengths to language features is locale dependant. For example, in Czech, "e" and "f" are considered primary differences, while "e" and "ě" are secondary differences, "e" and "E" are tertiary differences and "e" and "e" are identical.
Try out different strengths:
localeSpecificCollator.setStrength(Collator.PRIMARY);
and see what happens.
Upvotes: 0