TeamZ
TeamZ

Reputation: 361

how to decode a string which contains UTF-8(hex)

I am trying to decode a string which may contain multiple UTF8(hex) encoding like this:

"IMPU=H\u’C3A9’tm\u’C3A9’rf\u’C3B6’ldescsizma,AC=IMPU,AC=C-NTDB". 

I want to decode below string into a meaningful string.

I tried this :

String hex = "H\\u’C3A9’tm\\u’C3A9’rf\\u’C3B6’ldescsizma,DC=IMPU,DC=C-NTD‌​B"; 
ByteBuffer buff = ByteBuffer.allocate(hex.length()/2); 
for (int i = 0; i < hex.length(); i+=2) {
    buff.put((byte)Integer.parseInt(hex.substring(i, i+2), 16)); 
} 
buff.rewind(); 
Charset cs = Charset.forName("UTF-8"); 
CharBuffer cb = cs.decode(buff);
System.out.println(cb.toString());

Don't know how to proceed further, please let me know if anybody has any idea.

Upvotes: 1

Views: 281

Answers (1)

Andreas
Andreas

Reputation: 159086

Here is one way to do it:

String input = "IMPU=H\\u’C3A9’tm\\u’C3A9’rf\\u’C3B6’ldescsizma,AC=IMPU,AC=C-NTDB";

StringBuffer buf = new StringBuffer();
Matcher m = Pattern.compile("\\\\u’([0-9A-F]{4}(?:[0-9A-F]{2}){0,2})’").matcher(input);
while (m.find()) {
    byte[] utf8bytes = javax.xml.bind.DatatypeConverter.parseHexBinary(m.group(1));
    m.appendReplacement(buf, new String(utf8bytes, StandardCharsets.UTF_8));
}
String output = m.appendTail(buf).toString();

System.out.println(input);
System.out.println(output);

* Use of DatatypeConverter taken from this SO answer.

Output

IMPU=H\u’C3A9’tm\u’C3A9’rf\u’C3B6’ldescsizma,AC=IMPU,AC=C-NTDB
IMPU=Hétmérföldescsizma,AC=IMPU,AC=C-NTDB

Upvotes: 1

Related Questions