Reputation: 11881
The String s
and byte[] b
in the code below contain different representations of roughly the same thing.
import java.io.UnsupportedEncodingException;
import java.nio.charset.Charset;
import org.testng.annotations.Test;
public class Utf8Test {
@Test
public void test() throws UnsupportedEncodingException {
String s = "’";
byte[] b = new byte[] { (byte) 0xE2, (byte) 0x80, (byte) 0x99 };
System.out.println(s); // prints ’
String t = new String(b, Charset.forName("UTF-8"));
System.out.println(t); // prints ’
String u = new String(s.getBytes("ISO-8859-1"), Charset.forName("UTF-8"));
System.out.println(u); // prints ???
byte[] b2 = new byte[s.length()];
for(int i=0; i < s.length(); ++i) {
b2[i] = (byte) (s.charAt(i) & 0xFF);
}
String v = new String(b2, Charset.forName("UTF-8"));
System.out.println(v); // prints ?"
Assert.assertEquals(s,v); // FAIL
}
}
How can I convert String s
to the same value as String t
?
I have already tried the code resulting in String u
and String v
, and the result is indicated in the comments.
XY Problem
This is actually an XY Problem. The String s
is being returned in the HttpEntity
of an HttpClient
call. All I want is the properly decoded response. The above is far easier to reproduce than a whole HTTP stack so let's solve that instead.
Upvotes: 0
Views: 1586
Reputation: 11881
This seems to work, but I don't understand why, and I worry it may be platform-dependent:
byte[] d = s.getBytes("cp1252");
String w = new String(d, Charset.forName("UTF-8"));
System.out.println(w); // prints ’
Upvotes: 1