Reputation: 289
I have the following problem. I am trying to replace german umlauts like ä, ö, ü in java. But it simply does not work. Here is my code:
private static String[][] UMLAUT_REPLACEMENTS = { { "Ä", "Ae" }, { "Ü", "Ue" }, { "Ö", "Oe" }, { "ä", "ae" }, { "ü", "ue" }, { "ö", "oe" }, { "ß", "ss" } };
public static String replaceUmlaute(String orig) {
String result = orig;
for (int i = 0; i < UMLAUT_REPLACEMENTS.length; i++) {
result = result.replaceAll(UMLAUT_REPLACEMENTS[i][0], UMLAUT_REPLACEMENTS[i][1]);
}
return result;
}
An ä remains an ä and so on. I do not know if this issue has something to do with encoding, but the String contains the exact character I am trying to replace.
Thank you in advance
Upvotes: 18
Views: 65014
Reputation: 6069
Your code looks fine, replaceAll()
should work as expected.
Try this, if you also want to preserve capitalization (e.g. ÜBUNG
will become UEBUNG
, not UeBUNG
):
private static String replaceUmlaut(String input) {
// replace all lower Umlauts
String output = input.replace("ü", "ue")
.replace("ö", "oe")
.replace("ä", "ae")
.replace("ß", "ss");
// first replace all capital Umlauts in a non-capitalized context (e.g. Übung)
output = output.replaceAll("Ü(?=[a-zäöüß ])", "Ue")
.replaceAll("Ö(?=[a-zäöüß ])", "Oe")
.replaceAll("Ä(?=[a-zäöüß ])", "Ae");
// now replace all the other capital Umlauts
output = output.replace("Ü", "UE")
.replace("Ö", "OE")
.replace("Ä", "AE")
.replace("ẞ", "SS");
return output;
}
Upvotes: 13
Reputation: 131
A short solution is using a transliterator:
Transliterator transliterator = Transliterator.getInstance("de-ASCII");
String umlautReplaced = transliterator.transliterate(text);
Upvotes: -1
Reputation: 1361
If you use Apache Commons or Commons3 in your project, it would be most efficient to use a class like
public class UmlautCleaner {
private static final String[] UMLAUTE = new String[] {"Ä", "Ö", "Ü", "ä", "ö", "ü", "ß"};
private static final String[] UMLAUTE_REPLACEMENT = new String[] {"AE", "OE", "UE", "ae", "oe", "ue", "ss"};
private UmlautCleaner() {
}
public static String cleanSonderzeichen(final String s) {
return StringUtils.stripAccents(StringUtils.replaceEach(s, UMLAUTE, UMLAUTE_REPLACEMENT));
}
}
Upvotes: 1
Reputation: 13011
i had to modify the answer of user1438038:
private static String replaceUmlaute(String output) {
String newString = output.replace("\u00fc", "ue")
.replace("\u00f6", "oe")
.replace("\u00e4", "ae")
.replace("\u00df", "ss")
.replaceAll("\u00dc(?=[a-z\u00e4\u00f6\u00fc\u00df ])", "Ue")
.replaceAll("\u00d6(?=[a-z\u00e4\u00f6\u00fc\u00df ])", "Oe")
.replaceAll("\u00c4(?=[a-z\u00e4\u00f6\u00fc\u00df ])", "Ae")
.replace("\u00dc", "UE")
.replace("\u00d6", "OE")
.replace("\u00c4", "AE");
return newString;
}
This should work on any target platform (i had problems on a tomcat on windows).
Upvotes: 9
Reputation: 289
This finally worked for me:
private static String[][] UMLAUT_REPLACEMENTS = { { new String("Ä"), "Ae" }, { new String("Ü"), "Ue" }, { new String("Ö"), "Oe" }, { new String("ä"), "ae" }, { new String("ü"), "ue" }, { new String("ö"), "oe" }, { new String("ß"), "ss" } };
public static String replaceUmlaute(String orig) {
String result = orig;
for (int i = 0; i < UMLAUT_REPLACEMENTS.length; i++) {
result = result.replace(UMLAUT_REPLACEMENTS[i][0], UMLAUT_REPLACEMENTS[i][1]);
}
return result;
}
So thanks to all your answers and help. It finally was a mixture of nafas(with the new String) and Joop Eggen(the correct replace-Statement). You got my upvote thanks a lot!
Upvotes: 7
Reputation: 5423
ENCODING ENCODING ENCODING....
Different source of input may result in complications in the String encoding. for example one may have UTF-8
encoding while the other one is ISO
some people suggested that the code works for them, therefore, its most likely that your Strings have different encoding while processed. (different encoding results in different byte array thus no replacing...)
to solve your problem from its root,you must make sure, each of your sources uses exactly same encoding.
try this exercise and it hopefully helps you to solve your problem:
1-try this:
System.out.println(Arrays.asList("Ä".getBytes()); //1 and 2 should have same results
System.out.println(Arrays.asList(new String("Ä","UTF-8").getBytes()); //1 and 2 should have same results
System.out.println(Arrays.asList(new String("Ä","UTF-32").getBytes()); //should have a different results from one and two
System.out.println(Arrays.asList(orig.getBytes()); //look for representation and search for pattenr of numbers (this bit is the hard bit I guess).
System.out.println(Arrays.asList(new String(orig,"UTF-32").getBytes()); //look for representation and search for pattenr of numbers (this bit is the hard bit I guess).
the next step is to see how the orgi
string is formed. for example if you have received from web, make sure your POST and GET method are using your preferred encoding
EDIT 1:
try this:
{ { new String("Ä".getBytes(),"UTF-8"), "Ae" }, ... };
if this one didn't work try this:
byte[] bytes = {-61,-124}; //byte representation of Ä in utf-8
String Ae = new String(bytes,"UTF-8");
{ { Ae, "Ae" }, ... }; //and do for the rest
Upvotes: 4
Reputation: 33273
Works fine when I try it, so it must be an encoding issue.
Check your system encoding. You may want to add -encoding UTF-8
to your javac
compiler command line.
-encoding encoding
Set the source file encoding name, such as EUC-JP and UTF-8. If -encoding is not specified, the platform default converter is used.
Upvotes: 1
Reputation: 707
I've just tried to run it and it runs fine.
If you're not using regular expressions then i'd use string.replace
rather than string.replaceAll
as it's slightly quicker than the latter. The difference between them mainly being that replaceAll can handle regex's.
EDIT: Just noticed people in the comments have the said the same before me so if you've read theres you can pretty much ignore what I said, as stated the problem exists elsewhere in your code as that snippet works as expected.
Upvotes: 1