Reputation: 7321
In C# it appears that Grüsse
and Grüße
are considered equal in most circumstances as is explained by this nice webpage. I'm trying to find a similar behavior in Java - obviously not in java.lang.String
.
I thought I was in luck with java.regex.Pattern
in combination with Pattern.UNICODE_CASE
. The Javadoc says:
UNICODE_CASE enables Unicode-aware case folding. When this flag is specified then case-insensitive matching, when enabled by the CASE_INSENSITIVE flag, is done in a manner consistent with the Unicode Standard.
Yet the following code:
Pattern p = Pattern.compile(Pattern.quote("Grüsse"),
Pattern.UNICODE_CASE | Pattern.CASE_INSENSITIVE);
System.out.println(p.matcher("Grüße").matches());
yields false
. Why? And is there an alternative way of reproducing the C# case folding behavior?
---- edit ----
As @VGR pointed out, String.toUpperCase
will convert ß
to ss
, which may or may not be case folding (maybe I'm confusing concepts here). However other characters in the German locale are not "folded", for instance ü
does not become UE
. So to make my initial example more complete, is there a way to make Grüße
and Gruesse
compare equal in Java?
I was thinking the java.text.Normalizer
class could be used to do just that, but it converts ü
to u?
rather than ue
. It also hasn't an option to provide a Locale
, which confuses me even more.
Upvotes: 3
Views: 1332
Reputation: 31407
With the currently accepted answer:
foo.toUpperCase().equals(bar.toUpperCase())
The following inputs do not compare equal even though they should: Grüsse
and GRÜẞE
; or Grüße
and GRÜẞE
.
Why is that? Let's look at the uppercased strings:
"Grüsse".toUpperCase(Locale.ROOT) -> "GRÜSSE"
"Grüße".toUpperCase(Locale.ROOT) -> "GRÜSSE"
"GRÜẞE".toUpperCase(Locale.ROOT) -> "GRÜẞE"
As you can see, the uppercase "sharp S" (ẞ
) stays that way. To handle that correctly, do this:
foo.toLowerCase(Locale.ROOT).toUpperCase(Locale.ROOT).equals(
bar.toLowerCase(Locale.ROOT).toUpperCase(Locale.ROOT))
Note that the order is important. If you uppercase first and then lowercase, it would turn ẞ
into ß
(lowercase sharp S) only.
Upvotes: 1
Reputation: 7321
For reference, the following facts:
Character.toUpperCase()
cannot do case folding, as one character
must map to one character.
String.toUpperCase()
will do case folding.
String.equalsIgnoreCase()
uses Character.toUpperCase()
internally, so doesn't do case folding.
Conclusion (as @VGR pointed out): if you need case insensitive matching with case folding, you need to do:
foo.toUpperCase().equals(bar.toUpperCase())
and not:
foo.equalsIgnoreCase(bar)
As for the ü
and ue
equality, I've managed to do it with a RuleBasedCollator
and my own rules (one would expect Locale.German
had that built-in but alas). It looked really silly/over-engineered, and since I needed only the equality, not the sorting/collating, in the end I've settled for a simple set of String.replace
prior to comparison. It sucks but it works and is transparent/readable.
Upvotes: 0
Reputation: 5787
Use the ICU4J regular expressions, not the JDK ones: http://userguide.icu-project.org/strings/regexp#TOC-Case-Insensitive-Matching
Upvotes: 1