Reputation: 991
i need to develope a new methode, that should replace all Umlaute (ä, ö, ü) of a string entered with high performance with the correspondent HTML_Escapecodes. According to statistics only 5% of all strings entered contain Umlauts. As it is supposed that the method will be used extensively, any instantiation that is not necessary should be avoided. Could someone show me a way to do it?
Upvotes: 0
Views: 67
Reputation: 3050
These are the HTML escape codes. Additionally, HTML features arbitrary escaping with codes of the format :
and equivalently :
A simple string-replace is not going to be efficient with so many strings to replace. I suggest you split the string by entity matches, such as this:
String[] parts = str.split("&([A-Za-z]+|[0-9]+|x[A-Fa-f0-9]+);");
if(parts.length <= 1) return str; //No matched entities.
Then you can re-build the string with the replaced parts inserted.
StringBuilder result = new StringBuilder(str.length());
result.append(parts[0]); //First part always exists.
int pos = parts[0].length + 1; //Skip past the first entity and the ampersand.
for(int i = 1;i < parts.length;i++) {
String entityName = str.substring(pos,str.indexOf(';',pos));
if(entityName.matches("x[A-Fa-f0-9]+") && entityName.length() <= 5) {
result.append((char)Integer.decode("0" + entityName));
} else if(entityName.matches("[0-9]+")) {
result.append((char)Integer.decode(entityName));
} else {
switch(entityName) {
case "euml": result.append('ë'); break;
case "auml": result.append('ä'); break;
...
default: result.append("&" + entityName + ";"); //Unknown entity. Give the original string.
}
}
result.append(parts[i]); //Append the text after the entity.
pos += entityName.length() + parts[i].length() + 2; //Skip past the entity name, the semicolon and the following part.
}
return result.toString();
Rather than copy-pasting this code, type it in your own project by hand. This gives you the opportunity to look at how the code actually works. I didn't run this code myself, so I can't guarantee it being correct. It can also be made slightly more efficient by pre-compiling the regular expressions.
Upvotes: 1