Stan Kurilin
Stan Kurilin

Reputation: 15812

Converting binary data to String

If I have some binary data D And I convert it to string S. I expect than on converting it back to binary I will get D. But It's wrong.

public class A {
    public static void main(String[] args) throws IOException {
        final byte[] bytes = new byte[]{-114, 104, -35};// In hex: 8E 68 DD
        System.out.println(bytes.length);               //prints 3
        System.out.println(new String(bytes, "UTF-8").getBytes("UTF-8").length); //prints 7
    }
}

Why does this happens?

Upvotes: 1

Views: 2935

Answers (3)

lxbndr
lxbndr

Reputation: 2208

Your data can't be decoded into valid Unicode characters using UTF-8 encoding. Look at decoded string. It consists of 3 characters: 0xFFFD, 0x0068 and 0xFFFD. First and last are "�" - Unicode replacement characters. I think you need to choose other encoding. I.e. "CP866" produces valid string and converts back into same array.

Upvotes: 0

John Ericksen
John Ericksen

Reputation: 11113

Converting between a byte array to a String and back again is not a one-to-one mapping operation. Reading the docs, the String implmentation uses the CharsetDecoder to convert the incoming byte array into unicode. The first and last bytes in your input byte array must not map to a valid unicode character, thus it replaces it with some replacement string.

Upvotes: 2

David Wood
David Wood

Reputation: 385

It's likely that the bytes you're converting to a string don't actually form a valid string. If java can't figure out what you mean by each byte, it will attempt to fix them. This means that when you convert back to the byte array, it won't be the same as when you started. If you try with a valid set of bytes, then you should be more successful.

Upvotes: 1

Related Questions