java CollationKey sorting wrong

Question

I have a problem in comparing strings.I want to compare two "éd" and "ef" french texts like this

Collator localeSpecificCollator = Collator.getInstance(Locale.FRANCE);
CollationKey a = localeSpecificCollator.getCollationKey("éd");
CollationKey b = localeSpecificCollator.getCollationKey("ef");
System.out.println(a.compareTo(b));

This will print -1, but in french alphabet e come before é. But when we compare only e and é like this

Collator localeSpecificCollator = Collator.getInstance(Locale.FRANCE);
CollationKey a = localeSpecificCollator.getCollationKey("é");
CollationKey b = localeSpecificCollator.getCollationKey("e");
System.out.println(a.compareTo(b));

result is 1. Can you tell we what is wrong in first part of code?

assylias · Accepted Answer

This seems to be the expected behaviour and it also seems to be the correct way to sort alphabetically in French.

The Android javadoc gives a hint as to why it is behaving like that - I suppose the details of the implementation in android are similar, if not identical, to the the standard JDK:

A tertiary difference is ignored when there is a primary or secondary difference anywhere in the strings.

In other words, because your 2 strings are sortable by only looking at primary differences (excluding the accents) the collator does not check the other differences.

It seems to be compliant with the Unicode Collation Algorithm (UCA):

Accent differences are typically ignored, if the base letters differ.

And it also seems to be the correct way to sort alphabetically in French, according to the wikipedia article on "ordre alphabetique":

En première analyse, les caractères accentués, de même que les majuscules, ont le même rang alphabétique que le caractère fondamental
Si plusieurs mots ont le même rang alphabétique, on tâche de les distinguer entre eux grâce aux majuscules et aux accents (pour le e, on a l'ordre e, é, è, ê, ë)

In English: the order initially ignores accents and case - if 2 words can't be sorted that way, accents and case are then taken into account.

java CollationKey sorting wrong

Answers (2)

Related Questions