Thomas S.
Thomas S.

Reputation: 6345

Rename ä, ö, ü to ae, oe, ue

We want to rename strings that way that "strange" characters like German umlauts are translated to their official non-umlaut representation. In Java, is there some function to convert such characters (AKA handle the mapping), not only for the German umlauts, but also for French, Czech or Scandinavian characters? The reason is to create a function that could rename files/directories that could be handled without problems on different platforms by Subversion.

This question is similar but without a useful answer.

Upvotes: 2

Views: 6657

Answers (3)

hrvoj3e
hrvoj3e

Reputation: 2754

Answer is Any-Latin; De-ASCII; Latin-ASCII;

PHP specific answer using Transliterator (sorry for not providing Java code)

$val = 'BEGIN..Ä..Ö..Ü..ä..ö..ü..ẞ..ß..END';
echo Transliterator::create('Any-Latin; De-ASCII; Latin-ASCII;')->transliterate($val);
// output
//    BEGIN..AE..OE..UE..ae..oe..ue..SS..ss..END

Normal ASCII rule is Any-Latin; Latin-ASCII; (BEGIN..A..O..U..a..o..u..SS..ss..END)

Rules should work in any language with support for ICU = International Components for Unicode.

Upvotes: 3

Jere Käpyaho
Jere Käpyaho

Reputation: 1315

Use the ICU Transliterator. It is a generic class for performing these kinds of transliterations. You may need to provide your own map.

Upvotes: 5

user1438038
user1438038

Reputation: 6069

You can use the Unicode block property \p{InCombiningDiacriticalMarks} to remove (most) diacritical marks from Strings:

public String normalize(String input) {
  String output = Normalizer.normalize(input, Normalizer.Form.NFD); 
  Pattern pattern = Pattern.compile("\\p{InCombiningDiacriticalMarks}+");

  return pattern.matcher(output).replaceAll("");
}

This will not replace German umlauts the way you desire, though. It will turn ö into o, ä into a and so on. But maybe that's okay for you, too.

Upvotes: 4

Related Questions