Reputation: 3
I have a problem about the conversion between the encoding Windows-1252 to UTF-8.
I have a string encoded in Windows-1252 (e.g. the character: ¢). I would like to obtain the same symbol, but encoded in UTF-8. I mean: the source character and the destination character I would like that appear always the same (¢) but with different encoding.
Is it possible? In addition: is there a Java function which performs this conversion automatically (e.g. by passing the starting encoding and the end encoding)?
Upvotes: 0
Views: 6566
Reputation: 108969
You can transcode between various encodings using strings as an intermediary:
byte[] windows1252 = { (byte) 0xA2 };
String utf16 = new String(windows1252, Charset.forName("windows-1252"));
byte[] utf8 = utf16.getBytes(StandardCharsets.UTF_8);
char
data is always UTF-16 in Java.
Upvotes: 1