Reputation: 6345
We want to rename strings that way that "strange" characters like German umlauts are translated to their official non-umlaut representation. In Java, is there some function to convert such characters (AKA handle the mapping), not only for the German umlauts, but also for French, Czech or Scandinavian characters? The reason is to create a function that could rename files/directories that could be handled without problems on different platforms by Subversion.
This question is similar but without a useful answer.
Upvotes: 2
Views: 6657
Reputation: 2754
Answer is Any-Latin; De-ASCII; Latin-ASCII;
PHP specific answer using Transliterator
(sorry for not providing Java code)
$val = 'BEGIN..Ä..Ö..Ü..ä..ö..ü..ẞ..ß..END';
echo Transliterator::create('Any-Latin; De-ASCII; Latin-ASCII;')->transliterate($val);
// output
// BEGIN..AE..OE..UE..ae..oe..ue..SS..ss..END
Normal ASCII rule is Any-Latin; Latin-ASCII;
(BEGIN..A..O..U..a..o..u..SS..ss..END
)
Rules should work in any language with support for ICU = International Components for Unicode.
Upvotes: 3
Reputation: 1315
Use the ICU Transliterator. It is a generic class for performing these kinds of transliterations. You may need to provide your own map.
Upvotes: 5
Reputation: 6069
You can use the Unicode block property \p{InCombiningDiacriticalMarks}
to remove (most) diacritical marks from Strings:
public String normalize(String input) {
String output = Normalizer.normalize(input, Normalizer.Form.NFD);
Pattern pattern = Pattern.compile("\\p{InCombiningDiacriticalMarks}+");
return pattern.matcher(output).replaceAll("");
}
This will not replace German umlauts the way you desire, though. It will turn ö
into o
, ä
into a
and so on. But maybe that's okay for you, too.
Upvotes: 4