edoardotognoni
edoardotognoni

Reputation: 2752

HTML parse special chars in Android

I have this simple problem: once I retrieve a mail text, sometimes it happens that Html.fromHtml cannot parse correctly the string.

I'll give you an example. This is the HTML string:

&#‪8211‬;&#‪8211‬;&#‪8211‬;&#‪8211‬;&

It needs to be something like this:

–––––––––––––––––––––––––––

Is there a way in Android to achieve that? Do I need to use Regular Expressions?

Thank you so much.

Upvotes: 0

Views: 1347

Answers (2)

Esailija
Esailija

Reputation: 140220

You can filter out the hidden characters (in this case) with:

myString = myString.replaceAll( "[\\u202C\\u202A]", "" );

After that it's just:

Html.fromHtml(myString);

And it will work in html context. Or if you want the real em dash characters:

Html.fromHtml(Html.fromHtml(myString));

Demo of the concept: http://jsfiddle.net/CGzDc/ (javascript, you will have to use code in this answer for java)

Upvotes: 2

Jukka K. Korpela
Jukka K. Korpela

Reputation: 201568

The string in your example is HTML notation for –––& (literally), so the correct browser behavior is to render it that way. For some reason that cannot be guessed from the description, some software has applied double encoding in the sense of first encoding the em dash “—” as – and then encoding the & again, as &.

By the way, a sequence of consecutive em dashes may or may not produce a continuous line; this depends on the font. There are more reliable ways to producing long lines, such as the <hr> element and border properties in CSS.

Upvotes: 2

Related Questions