Reputation: 29816
I have a text:
Á example link.
In ISO-8859-1 Á
is Á
.
Now I am trying to convert that Á
to Á
using following code:
Charset utf8charset = Charset.forName("UTF-8");
Charset iso88591charset = Charset.forName("ISO-8859-1");
ByteBuffer inputBuffer = ByteBuffer.wrap(text.getBytes());
CharBuffer data = iso88591charset.decode(inputBuffer);
ByteBuffer outputBuffer = utf8charset.encode(data);
byte[] outputData = outputBuffer.array();
return new String(outputData);
But it doesn't converting that Á
to Á
.
Is the any way to achieve this?
Also I want to know, given a String can we determine which Charset is it?
Upvotes: 2
Views: 1402
Reputation: 42890
I think you have confused character encodings (UTF-8, ISO-8859-1...) with HTML Character Entities (Á
, Ö
et.c.).
Check out the unescapeHtml function of Apache Commons StringEscapeUtils, I assume it will do what you want.
Upvotes: 5