Reputation: 1118
When I convert a UTF-8 String with chars that are not known in 8859-1 to 8859-1 then i get question marks here and there. Sure what sould he do else!
Is there a java tool that can map a string like "İKEA" to "IKEA" and avoid ? to make the best out of it?
Upvotes: 3
Views: 1954
Reputation: 108979
For the specific example, you can:
Example:
ByteArrayOutputStream out = new ByteArrayOutputStream();
// create encoder
CharsetEncoder encoder = StandardCharsets.ISO_8859_1.newEncoder();
encoder.onUnmappableCharacter(CodingErrorAction.IGNORE);
// write data
String ikea = "\u0130KEA";
String decomposed = Normalizer.normalize(ikea, Form.NFKD);
CharBuffer cbuf = CharBuffer.wrap(decomposed);
ByteBuffer bbuf = encoder.encode(cbuf);
out.write(bbuf.array());
// verify
String decoded = new String(out.toByteArray(), StandardCharsets.ISO_8859_1);
System.out.println(decoded);
You're still transcoding from a character set that defines 109,384 values (Unicode 6) to one that supports 256 so there will always be limitations.
Also consider a more sophisticated transformation API like ICU for features like transliteration.
Upvotes: 1