emilly
emilly

Reputation: 10530

How to unescape Jsoup document?

I have html file which contains below content:

<html>
    <title><s:message code="test" /></title>
</html>

Java Program:

String input = readFileAsString(filePath);
Document doc = Jsoup.parse(input);

Elements messageEls = doc.select("s|message");

I see output as below:

<html>
 <head>
  <title>&lt;s:message code="test" /&gt;</title> 
 </head>
 <body> 
 </body>

Somehow character < is converted &lt. How can I get original contect without enscape ? Actually I need find elements <s:message but because of escaping , it's not finding element <s:message code="test" /> ?

Upvotes: 1

Views: 1231

Answers (1)

Mạnh Quyết Nguyễn
Mạnh Quyết Nguyễn

Reputation: 18235

Jsoup escapes because <s:message /> not a standard HTML tag.

Try to use XML parser:

Document doc = Jsoup.parse(input, "", Parser.xmlParser());

Create a new XML parser. This parser assumes no knowledge of the incoming tags and does not treat it as HTML, rather creates a simple tree directly from the input.

Upvotes: 1

Related Questions