Reputation: 1
There are two different byte array.When i get String from byte[].They have same value when i use utf8. Opposite when i use ISO-8859-1.
byte[] valueFir = new byte[]{0, 1, -79};
byte[] valueSec = new byte[]{0, 1, -80};
Charset CHARSET = Charset.forName("ISO-8859-1");
Charset UTF8SET = Charset.forName("UTF-8");
Charset[] list = new Charset[]{CHARSET, UTF8SET};
for(int i=0; i<list.length; i++){
String fir = new String(valueFir,list[i]);
String sec = new String(valueSec,list[i]);
Assert.assertNotEquals(fir,sec);
}
First assert is true,Second assert is fail. what's the reason?
Upvotes: 0
Views: 391
Reputation: 79807
If you look at the Javadoc for the String
constructor that you're using, it says
This method always replaces malformed-input and unmappable-character sequences with this charset's default replacement string.
Now in UTF8, the bytes -79 and -80 don't map to individual characters. So both your byte arrays make no sense in UTF8. And because they're unmappable, you're just getting the default String
twice. Your assertNotEquals
is then comparing the default String
to itself.
However, your byte arrays make perfect sense in ISO-8859-1, and get converted to two different String
values.
Upvotes: 4