Reputation: 3
can you help my, can i encode charset in java on UTF-16 "test" output 0074 0065 0073 0074, are there some function for this?
String x = "test";
System.out.println(x);
Upvotes: 0
Views: 381
Reputation: 7808
The standard method of Java would be method getBytes(Charset charset)
of class String
. To demonstrate I just wrote a small method:
private static void encodingTest() {
String testStr = "test";
System.out.println(StringUnicodeEncoderDecoder.encodeStringToUnicodeSequence(testStr));
StringBuilder sb = new StringBuilder();
byte[] bytes = testStr.getBytes(StandardCharsets.UTF_16);
for(byte b: bytes) {
sb.append(b).append(" ");
}
System.out.println(sb.toString());
}
And the output of that method is:
\u0074\u0065\u0073\u0074
-2 -1 0 116 0 101 0 115 0 116
Note that values 116, 101, 115, 116 are decimal values which if converted to Hex would be 74, 65, 73, and 74 - which is what you are looking for. The class StringUnicodeEncoderDecoder
that you see in my code and that gives you the output \u0074\u0065\u0073\u0074
is not part of a standard Java. It is part of an Open Source MgntUtils library written by me. But it could be very useful to you in this case. Here is the Javadoc for the class StringUnicodeEncoderDecoder. The library itself could be obtained as Maven artifacts from here or from Github as a jar (including source code and Javadoc)
Here is a modified code:
private static void encodingTest() {
String testStr = "test";
String encoded = StringUnicodeEncoderDecoder.encodeStringToUnicodeSequence(testStr);
System.out.println(encoded);
System.out.println(encoded.replaceAll("\\\\u", " "));
System.out.println(encoded.replaceAll("\\\\u", ""));
}
And the output would be:
\u0074\u0065\u0073\u0074
0074 0065 0073 0074
0074006500730074
Upvotes: 1