Hevv
Hevv

Reputation: 41

How to unescape HTML entities in Java?

In my database, I have a escaped HTML string that contains, among other HTML Entities, 𝐿.

If I display it in a HTML file, I can see the symbol that I expect: 𝐿.

According to https://www.compart.com/en/unicode/U+1D43F it is called "Mathematical Italic Capital L" and could also be represented as HTML entity as 𝐿. However, I want to display that string in a PDF file, so I am trying to unescape 𝐿 in my Java application.

I have tried some different methods, and none worked as I intended:

method output
org.apache.commons.lang.StringEscapeUtils.unescapeHtml("𝐿")
org.apache.commons.lang.StringEscapeUtils.unescapeXml("𝐿");
org.apache.commons.lang3.StringEscapeUtils.unescapeHtml4("𝐿"); ݐ¿
org.apache.commons.lang3.StringEscapeUtils.unescapeXml("𝐿"); ݐ¿
org.jsoup.nodes.Entities.unescape("𝐿"); ݐ¿
org.jsoup.parser.Parser.unescapeEntities("𝐿", true); ݐ¿
org.unbescape.html.HtmlEscape.unescapeHtml("𝐿") ݐ¿

Is there a way to, in Java, unescape those kinds of HTML entities exactly as a browser would do?

Obs.: I don't know which method or library was used to escape to that HTML Entity.

Upvotes: 0

Views: 84

Answers (0)

Related Questions