Reputation: 323
Where I fail ?
I have incoming String from server with value "%u0419%u043E".
I try to convert it to normal String, but I see chinese letters. And it is error because incoming letter is cyrillic.
Code :
// String test = ""%u0419%u043E"; <--- this is Йо ( cyrillic )
byte[] test = { (byte) 0x25, (byte) 0x75, (byte)0x30, (byte)0x34, (byte)0x31, (byte) 0x39,(byte) 0x25, (byte) 0x75, (byte)0x30, (byte)0x34, (byte)0x33, (byte) 0x45};
String aaa = new String(test, "UTF-16");
aaa = new String(test, "UTF-8");
aaa = new String(test, "ISO-8859-5");
The image explains what I do :
Upvotes: 3
Views: 2746
Reputation: 17725
As far as I know this is not a standard encoding, at least not one of the UTF-* or ISO-*.
You need to decode it yourself, e.g.
public static String decode(String encoded) {
// "%u" followed by 4 hex digits, capture the digits
Pattern p = Pattern.compile("%u([0-9a-f]{4})", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(encoded);
StringBuffer decoded = new StringBuffer(encoded.length());
// replace every occurrences (and copy the parts between)
while (m.find()) {
m.appendReplacement(decoded, Character.toString((char)Integer.parseInt(m.group(1), 16)));
}
m.appendTail(decoded);
return decoded.toString();
}
This gives :
System.out.println(decode("%u0419%u043E"));
Йо
Upvotes: 2