michaelsmith
michaelsmith

Reputation: 1051

android html decoding

I am confused about html text that I need to decode before I display it to the user. I do:

result= Html.fromHtml(temp).toString();

where temp contains something like: "B \u0026 M Collision Repair". However result contains exactly the same as temp after execution. What am I missing here?

Upvotes: 4

Views: 13941

Answers (5)

Williaan Lopes
Williaan Lopes

Reputation: 1367

String firstName = "Some Name";
String secondName = "Other Name"
String concatStrings = firstName + " \u25CF " + secondName;
textView.setText(Html.fromHtml("<font color'#2c51be'>Name: </font>" + concatStrings));

The unicode \u25CF => ●

Use this LINK to get unicode symbols

Upvotes: 0

Alex
Alex

Reputation: 5979

Some clarification:

  • "B \u0026 M Collision Repair" is not HTML.
  • "B &#x0026; M Collision Repair" is HTML.

Java to HTML

"B \u0026 M Collision Repair" is not HTML. It is a Java String literal, or how you create a string in Java code. Unicode characters are stored as decoded raw characters. The \u notation is only used to escape unicode characters when creating the string, it is not stored that way. Side note, because this ampersand character is in the ISO-8859-1 range, it does not need to be escaped in this way. "B & M Collision Repair" is the same thing in Java.

Converting Java strings to HTML is common, and should be done in order to display Java strings in a web browser. This would be called encoding HTML.

To convert Java string to HTML, thereby encoding Java raw unicode characters to HTML entities:

String java = "B \u0026 M Collision Repair";
#=> (String) "B \u0026 M Collision Repair"
#=> (String) "B & M Collision Repair"

String html = Html.escapeHtml(html);
#=> (String) "B &#x0026;  M Collision Repair"
#=> (String) "B &amp;  M Collision Repair"

#or
String html = Html.toHtml(html).toString();
#=> (String) "B &#x0026;  M Collision Repair"
#=> (String) "B &amp;  M Collision Repair"

HTML to Java

"B &#x0026; M Collision Repair" is HTML. Unicode characters are stored as encoded character entities. The &#x; notation is used to escape unicode characters for transmission over ISO-8859-1. A web browser decodes them to display actual unicode characters.

Converting HTML to Java strings is less common, and is usually reserved for 'scraping' or 'parsing' Java strings for storage and display in some system that does not support HTML. This would be called decoding HTML.

To convert HTML to Java string, thereby decoding HTML entities to Java raw unicode characters:

String html = "B &#x0026; M Collision Repair";
#=> (String) "B &#x0026; M Collision Repair"

String java = Html.fromHtml(html).toString();
#=> (String) "B \u0026 M Collision Repair"
#=> (String) "B & M Collision Repair"

Upvotes: 16

sphere4a
sphere4a

Reputation: 125

The \n0006 is Unicode which is not getting translated. Suggestion:

String temp = "<html>B \u0026 M Collision Repair</html>";
String result = Html.fromHtml(temp).toString();

Upvotes: 0

Zolt&#225;n
Zolt&#225;n

Reputation: 22166

Try this class.

result = URLDecoder.decode(temp,"UTF-8");

Upvotes: 0

Andro Selva
Andro Selva

Reputation: 54322

Even I had the same issue. Try this,

Spanned ss=Html.fromHtml(your String);
String tempString=ss.toString();

Upvotes: 1

Related Questions