zuzeep
zuzeep

Reputation: 1

Java: Different byte[] has same string in utf8

There are two different byte array.When i get String from byte[].They have same value when i use utf8. Opposite when i use ISO-8859-1.

    byte[] valueFir = new byte[]{0, 1, -79};
    byte[] valueSec = new byte[]{0, 1, -80};

    Charset CHARSET = Charset.forName("ISO-8859-1");
    Charset UTF8SET = Charset.forName("UTF-8");
    Charset[] list = new Charset[]{CHARSET, UTF8SET};

    for(int i=0; i<list.length; i++){

        String fir = new String(valueFir,list[i]);
        String sec = new String(valueSec,list[i]);

        Assert.assertNotEquals(fir,sec);
    }

First assert is true,Second assert is fail. what's the reason?

Upvotes: 0

Views: 391

Answers (1)

Dawood ibn Kareem
Dawood ibn Kareem

Reputation: 79807

If you look at the Javadoc for the String constructor that you're using, it says

This method always replaces malformed-input and unmappable-character sequences with this charset's default replacement string.

Now in UTF8, the bytes -79 and -80 don't map to individual characters. So both your byte arrays make no sense in UTF8. And because they're unmappable, you're just getting the default String twice. Your assertNotEquals is then comparing the default String to itself.

However, your byte arrays make perfect sense in ISO-8859-1, and get converted to two different String values.

Upvotes: 4

Related Questions