user3514322
user3514322

Reputation: 3

Java functions to encode Windows-1252 to UTF-8 getting the same symbol

I have a problem about the conversion between the encoding Windows-1252 to UTF-8.

I have a string encoded in Windows-1252 (e.g. the character: ¢). I would like to obtain the same symbol, but encoded in UTF-8. I mean: the source character and the destination character I would like that appear always the same (¢) but with different encoding.

Is it possible? In addition: is there a Java function which performs this conversion automatically (e.g. by passing the starting encoding and the end encoding)?

Upvotes: 0

Views: 6566

Answers (1)

McDowell
McDowell

Reputation: 108969

You can transcode between various encodings using strings as an intermediary:

byte[] windows1252 = { (byte) 0xA2 };
String utf16 = new String(windows1252, Charset.forName("windows-1252"));
byte[] utf8 = utf16.getBytes(StandardCharsets.UTF_8);

char data is always UTF-16 in Java.

Upvotes: 1

Related Questions