user2841991
user2841991

Reputation: 289

java replace German umlauts

I have the following problem. I am trying to replace german umlauts like ä, ö, ü in java. But it simply does not work. Here is my code:

private static String[][] UMLAUT_REPLACEMENTS = { { "Ä", "Ae" }, { "Ü", "Ue" }, { "Ö", "Oe" }, { "ä", "ae" }, { "ü", "ue" }, { "ö", "oe" }, { "ß", "ss" } };
public static String replaceUmlaute(String orig) {
    String result = orig;

    for (int i = 0; i < UMLAUT_REPLACEMENTS.length; i++) {
        result = result.replaceAll(UMLAUT_REPLACEMENTS[i][0], UMLAUT_REPLACEMENTS[i][1]);
    }

    return result;
}

An ä remains an ä and so on. I do not know if this issue has something to do with encoding, but the String contains the exact character I am trying to replace.

Thank you in advance

Upvotes: 18

Views: 65014

Answers (8)

user1438038
user1438038

Reputation: 6069

Your code looks fine, replaceAll() should work as expected.

Try this, if you also want to preserve capitalization (e.g. ÜBUNG will become UEBUNG, not UeBUNG):

private static String replaceUmlaut(String input) {
 
     // replace all lower Umlauts
     String output = input.replace("ü", "ue")
                          .replace("ö", "oe")
                          .replace("ä", "ae")
                          .replace("ß", "ss");
 
     // first replace all capital Umlauts in a non-capitalized context (e.g. Übung)
     output = output.replaceAll("Ü(?=[a-zäöüß ])", "Ue")
                    .replaceAll("Ö(?=[a-zäöüß ])", "Oe")
                    .replaceAll("Ä(?=[a-zäöüß ])", "Ae");
 
     // now replace all the other capital Umlauts
     output = output.replace("Ü", "UE")
                    .replace("Ö", "OE")
                    .replace("Ä", "AE")
                    .replace("ẞ", "SS");
 
     return output;
 }

Source

Upvotes: 13

Hui Gui
Hui Gui

Reputation: 131

A short solution is using a transliterator:

Transliterator transliterator = Transliterator.getInstance("de-ASCII");
String umlautReplaced = transliterator.transliterate(text);

Upvotes: -1

JRA_TLL
JRA_TLL

Reputation: 1361

If you use Apache Commons or Commons3 in your project, it would be most efficient to use a class like

public class UmlautCleaner {

    private static final String[] UMLAUTE = new String[] {"Ä", "Ö", "Ü", "ä", "ö", "ü", "ß"};
    private static final String[] UMLAUTE_REPLACEMENT = new String[] {"AE", "OE", "UE", "ae", "oe", "ue", "ss"};

    private UmlautCleaner() {
    }

    public static String cleanSonderzeichen(final String s) {
        return StringUtils.stripAccents(StringUtils.replaceEach(s, UMLAUTE, UMLAUTE_REPLACEMENT));
    }
}

Upvotes: 1

dermoritz
dermoritz

Reputation: 13011

i had to modify the answer of user1438038:

private static String replaceUmlaute(String output) {
    String newString = output.replace("\u00fc", "ue")
            .replace("\u00f6", "oe")
            .replace("\u00e4", "ae")
            .replace("\u00df", "ss")
            .replaceAll("\u00dc(?=[a-z\u00e4\u00f6\u00fc\u00df ])", "Ue")
            .replaceAll("\u00d6(?=[a-z\u00e4\u00f6\u00fc\u00df ])", "Oe")
            .replaceAll("\u00c4(?=[a-z\u00e4\u00f6\u00fc\u00df ])", "Ae")
            .replace("\u00dc", "UE")
            .replace("\u00d6", "OE")
            .replace("\u00c4", "AE");
    return newString;
}

This should work on any target platform (i had problems on a tomcat on windows).

Upvotes: 9

user2841991
user2841991

Reputation: 289

This finally worked for me:

private static String[][] UMLAUT_REPLACEMENTS = { { new String("Ä"), "Ae" }, { new String("Ü"), "Ue" }, { new String("Ö"), "Oe" }, { new String("ä"), "ae" }, { new String("ü"), "ue" }, { new String("ö"), "oe" }, { new String("ß"), "ss" } };
public static String replaceUmlaute(String orig) {
    String result = orig;

    for (int i = 0; i < UMLAUT_REPLACEMENTS.length; i++) {
        result = result.replace(UMLAUT_REPLACEMENTS[i][0], UMLAUT_REPLACEMENTS[i][1]);
    }

    return result;
}

So thanks to all your answers and help. It finally was a mixture of nafas(with the new String) and Joop Eggen(the correct replace-Statement). You got my upvote thanks a lot!

Upvotes: 7

nafas
nafas

Reputation: 5423

ENCODING ENCODING ENCODING....

Different source of input may result in complications in the String encoding. for example one may have UTF-8 encoding while the other one is ISO

some people suggested that the code works for them, therefore, its most likely that your Strings have different encoding while processed. (different encoding results in different byte array thus no replacing...)

to solve your problem from its root,you must make sure, each of your sources uses exactly same encoding.

try this exercise and it hopefully helps you to solve your problem:

1-try this:

System.out.println(Arrays.asList("Ä".getBytes());  //1 and 2 should have same results
System.out.println(Arrays.asList(new String("Ä","UTF-8").getBytes()); //1 and 2 should have same results
System.out.println(Arrays.asList(new String("Ä","UTF-32").getBytes()); //should have a different results from one and two
System.out.println(Arrays.asList(orig.getBytes()); //look for representation and search for pattenr of numbers (this bit is the hard bit I guess).
System.out.println(Arrays.asList(new String(orig,"UTF-32").getBytes()); //look for representation and search for pattenr of numbers (this bit is the hard bit I guess).

the next step is to see how the orgi string is formed. for example if you have received from web, make sure your POST and GET method are using your preferred encoding

EDIT 1:

try this:

{ { new String("Ä".getBytes(),"UTF-8"), "Ae" }, ... };

if this one didn't work try this:

    byte[] bytes = {-61,-124}; //byte representation of Ä in utf-8
    String Ae = new String(bytes,"UTF-8");
    { { Ae, "Ae" }, ... }; //and do for the rest

Upvotes: 4

Klas Lindb&#228;ck
Klas Lindb&#228;ck

Reputation: 33273

Works fine when I try it, so it must be an encoding issue.

Check your system encoding. You may want to add -encoding UTF-8 to your javac compiler command line.

      -encoding encoding
         Set the source file encoding name, such as EUC-JP and UTF-8. If -encoding is not specified, the platform default converter is used.

Upvotes: 1

Vistari
Vistari

Reputation: 707

I've just tried to run it and it runs fine.

If you're not using regular expressions then i'd use string.replace rather than string.replaceAll as it's slightly quicker than the latter. The difference between them mainly being that replaceAll can handle regex's.

EDIT: Just noticed people in the comments have the said the same before me so if you've read theres you can pretty much ignore what I said, as stated the problem exists elsewhere in your code as that snippet works as expected.

Upvotes: 1

Related Questions