Tapas Bose
Tapas Bose

Reputation: 29816

Get Actual Character from ISO-8859-1 Character

I have a text:

Á example link.

In ISO-8859-1 Á is Á.

Now I am trying to convert that Á to Á using following code:

Charset utf8charset = Charset.forName("UTF-8");
Charset iso88591charset = Charset.forName("ISO-8859-1");

ByteBuffer inputBuffer = ByteBuffer.wrap(text.getBytes());

CharBuffer data = iso88591charset.decode(inputBuffer);

ByteBuffer outputBuffer = utf8charset.encode(data);
byte[] outputData = outputBuffer.array();
return new String(outputData);

But it doesn't converting that Á to Á.

Is the any way to achieve this?

Also I want to know, given a String can we determine which Charset is it?

Upvotes: 2

Views: 1402

Answers (1)

Anders Lindahl
Anders Lindahl

Reputation: 42890

I think you have confused character encodings (UTF-8, ISO-8859-1...) with HTML Character Entities (Á, Ö et.c.).

Check out the unescapeHtml function of Apache Commons StringEscapeUtils, I assume it will do what you want.

Upvotes: 5

Related Questions