Dónal
Dónal

Reputation: 187399

decode UTF-16 text

I have a Java servlet that receives data from an upstream system via a HTTP GET request. This request includes a parameter named "text" and another named "charset" that indicates how the text parameter was encoded:

If I instruct the upstream system to send me the text and debug the servlet request params, I see the following:

request.getParameter("charset") == "UTF-16LE"
request.getParameter("text").getBytes() == [0, 84, 1, 0]

The code points (in hex) for the two characters in this string are:

[T]  0054
[Ā]  0100

I cannot figure out how to convert this byte[] back to the String "TĀ". I should mention that I don't entirely trust the charset and suspect it may be using UTF-16BE.

Upvotes: 0

Views: 4092

Answers (2)

user207421
user207421

Reputation: 311050

Why are you calling getBytes() at all? You already have the parameter as a String. Calling getBytes(), without specifying a charset, is just an opportunity to mangle the data.

Upvotes: 0

matts
matts

Reputation: 6897

Use the String(byteArray, charset) constructor:

byte[] bytes = { 0, 84, 1, 0 };
String string = new String(bytes, "UTF-16BE");

Upvotes: 6

Related Questions