Reputation: 361
I am trying to decode a string which may contain multiple UTF8(hex) encoding like this:
"IMPU=H\u’C3A9’tm\u’C3A9’rf\u’C3B6’ldescsizma,AC=IMPU,AC=C-NTDB".
I want to decode below string into a meaningful string.
I tried this :
String hex = "H\\u’C3A9’tm\\u’C3A9’rf\\u’C3B6’ldescsizma,DC=IMPU,DC=C-NTDB";
ByteBuffer buff = ByteBuffer.allocate(hex.length()/2);
for (int i = 0; i < hex.length(); i+=2) {
buff.put((byte)Integer.parseInt(hex.substring(i, i+2), 16));
}
buff.rewind();
Charset cs = Charset.forName("UTF-8");
CharBuffer cb = cs.decode(buff);
System.out.println(cb.toString());
Don't know how to proceed further, please let me know if anybody has any idea.
Upvotes: 1
Views: 281
Reputation: 159086
Here is one way to do it:
String input = "IMPU=H\\u’C3A9’tm\\u’C3A9’rf\\u’C3B6’ldescsizma,AC=IMPU,AC=C-NTDB";
StringBuffer buf = new StringBuffer();
Matcher m = Pattern.compile("\\\\u’([0-9A-F]{4}(?:[0-9A-F]{2}){0,2})’").matcher(input);
while (m.find()) {
byte[] utf8bytes = javax.xml.bind.DatatypeConverter.parseHexBinary(m.group(1));
m.appendReplacement(buf, new String(utf8bytes, StandardCharsets.UTF_8));
}
String output = m.appendTail(buf).toString();
System.out.println(input);
System.out.println(output);
* Use of DatatypeConverter
taken from this SO answer.
Output
IMPU=H\u’C3A9’tm\u’C3A9’rf\u’C3B6’ldescsizma,AC=IMPU,AC=C-NTDB
IMPU=Hétmérföldescsizma,AC=IMPU,AC=C-NTDB
Upvotes: 1