Reputation: 2095
Some tool is sending me Japanese content as byte array.
So using java I have to read that byte array and display the Japanese content.
I am not getting any ideas for achieving this.
Till now I tried with below mentioned program just to check how this conversion works:
String s= "業界支出TXT_20150130170955";
byte b1[];
try {
b1 = s.getBytes();
for (int j=0;j<b1.length; j++){
System.out.println(b1[j]+"-----------"+(char)b1[1]);
}
} catch (UnsupportedEncodingException e2) {
// TODO Auto-generated catch block
e2.printStackTrace();
}
Now this gives me some junk data. I know I am doing this entirely wrong but I am not getting any idea to read a byte stream to Japanese characters.
Any help would be appreciated.
Edit :1
WE NEED TO GET THE JAPANESE CHARS FROM "decoded" BYTE ARRAY I tried following things :
byte[] decoded = Base64.decodeBase64("qzD8MMkwGk/hVClSKHWCaYGJCP/GMK0wuTDIMAn/DQAKAA0ACgApUih1xzD8ML8w1lOXX+VlfgCgUt92l15qdfdTfgCgUt92l15+AClSKHVzijB9fgAakKiMfgB+AKsw/DDJMBpP4VQNVE1Sfg==");
try {
System.out.println(new String(decoded, "UTF-8") + "\n");
System.out.println(new String(decoded, "SHIFTJIS") + "\n");
} catch (UnsupportedEncodingException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
but we are not getting the expected results pls advide
Upvotes: 1
Views: 3541
Reputation: 417522
To convert a byte array to a String
, you should use the String(byte[] bytes, Charset charset)
constructor.
To properly decode the bytes into a sequence of characters, you have to know the character encoding in which to interpret the bytes. The most common is UTF-8.
Example:
// Bytes of UTF-8 encoded Japanese word: "そこ" (there)
byte[] data = new byte[]{-29, -127, -99, -29, -127, -109};
String s = new String(data, StandardCharsets.UTF_8);
System.out.println(s);
Output:
そこ
Note that the reverse order (String
=> byte[]
) can be achieved with the
byte[] String.getBytes(Charset charset)
method:
String s = "そこ";
byte[] data = s.getBytes(StandardCharsets.UTF_8);
System.out.println(Arrays.toString(data));
Which prints:
[-29, -127, -99, -29, -127, -109]
Final note
Avoid using the String
constructor which only takes a byte array and no charset, and the String.getBytes()
method which has no parameters because converting a String
to byte[]
or the other way, an encoding is required; and even if you don't specify an encoding, one will still be used: the platforms's default encoding which can vary from platform to platform or even from run-to-run hence your code would become unportable (could work differently on differnet machines).
For Java prior to 7.0
If you use a Java prior to 7.0, you can use the constructor and getBytes()
method which takes the charset as a String
and not as a Charset
. You have to provide the name of the charset:
String(byte[] bytes, String charsetName)
byte[] getBytes(String charsetName)
Example:
// From String to byte array:
byte[] data = s.getBytes("UTF-8");
// From byte array to String:
String s = new String(data, "UTF-8");
Upvotes: 2