Reputation: 574
Is there any Collator
implementation which has the same characteristics as MySQL's utf8_general_ci? I need a collator which is case insensitive and does not distinguish german umlauts like ä
with the vowel a
.
Background:
We recently encountered a bug which was caused by a wrong collation in our table. The used collation was utf8_general_ci
where utf8_bin
would be the correct one. The particular column had a unique index. The utf8_general_ci
collation does not distinguish between words like pöker
and poker
, so the rows were merged, which was not desired.
I now need a way to implement a module for our Java application, which repairs the wrong rows.
Upvotes: 4
Views: 1117
Reputation: 360
You could use the following collator:
Collator collator = Collator.getInstance();
collator.setStrength(Collator.PRIMARY);
A collator with this strength will only consider primary differences significant during comparison.
Consider an example:
System.out.println(compare("abc", "ÀBC", Collator.PRIMARY)); //base char
System.out.println(compare("abc", "ÀBC", Collator.SECONDARY)); //base char + accent
System.out.println(compare("abc", "ÀBC", Collator.TERTIARY)); //base char + accent + case
System.out.println(compare("abc", "ÀBC", Collator.IDENTICAL)); //base char + accent + case + bits
private static int compare(String first, String second, int strength) {
Collator collator = Collator.getInstance();
collator.setStrength(strength);
return collator.compare(first, second);
}
The output is:
0
-1
-1
-1
Have a look at these links for more information:
http://www.javapractices.com/topic/TopicAction.do?Id=207 https://docs.oracle.com/javase/7/docs/api/java/text/Collator.html#PRIMARY
Upvotes: 2