Onki
Onki

Reputation: 2095

how to convert byte array to Japanese character

Some tool is sending me Japanese content as byte array.

So using java I have to read that byte array and display the Japanese content.

I am not getting any ideas for achieving this.

Till now I tried with below mentioned program just to check how this conversion works:

String s= "業界支出TXT_20150130170955";
    byte b1[];
    try {
        b1 = s.getBytes();
        for (int j=0;j<b1.length; j++){
            System.out.println(b1[j]+"-----------"+(char)b1[1]);
        }
    } catch (UnsupportedEncodingException e2) {
        // TODO Auto-generated catch block
        e2.printStackTrace();
    } 

Now this gives me some junk data. I know I am doing this entirely wrong but I am not getting any idea to read a byte stream to Japanese characters.

Any help would be appreciated.

Edit :1

WE NEED TO GET THE JAPANESE CHARS FROM "decoded" BYTE ARRAY I tried following things :

 byte[] decoded = Base64.decodeBase64("qzD8MMkwGk/hVClSKHWCaYGJCP/GMK0wuTDIMAn/DQAKAA0ACgApUih1xzD8ML8w1lOXX+VlfgCgUt92l15qdfdTfgCgUt92l15+AClSKHVzijB9fgAakKiMfgB+AKsw/DDJMBpP4VQNVE1Sfg==");
        try {
            System.out.println(new String(decoded, "UTF-8") + "\n");
System.out.println(new String(decoded, "SHIFTJIS") + "\n"); 
        } catch (UnsupportedEncodingException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } 

but we are not getting the expected results pls advide

Upvotes: 1

Views: 3541

Answers (1)

icza
icza

Reputation: 417522

To convert a byte array to a String, you should use the String(byte[] bytes, Charset charset) constructor.

To properly decode the bytes into a sequence of characters, you have to know the character encoding in which to interpret the bytes. The most common is UTF-8.

Example:

// Bytes of UTF-8 encoded Japanese word: "そこ" (there)
byte[] data = new byte[]{-29, -127, -99, -29, -127, -109};

String s = new String(data, StandardCharsets.UTF_8);
System.out.println(s);

Output:

そこ

Note that the reverse order (String => byte[]) can be achieved with the
byte[] String.getBytes(Charset charset) method:

String s = "そこ";
byte[] data = s.getBytes(StandardCharsets.UTF_8);
System.out.println(Arrays.toString(data));

Which prints:

[-29, -127, -99, -29, -127, -109]

Final note

Avoid using the String constructor which only takes a byte array and no charset, and the String.getBytes() method which has no parameters because converting a String to byte[] or the other way, an encoding is required; and even if you don't specify an encoding, one will still be used: the platforms's default encoding which can vary from platform to platform or even from run-to-run hence your code would become unportable (could work differently on differnet machines).

For Java prior to 7.0

If you use a Java prior to 7.0, you can use the constructor and getBytes() method which takes the charset as a String and not as a Charset. You have to provide the name of the charset:

String(byte[] bytes, String charsetName)

byte[] getBytes(String charsetName)

Example:

// From String to byte array:
byte[] data = s.getBytes("UTF-8");

// From byte array to String:
String s = new String(data, "UTF-8");

Upvotes: 2

Related Questions