How to unescape Jsoup document?

Question

I have html file which contains below content:


    <s:message code="test" />

Java Program:

String input = readFileAsString(filePath);
Document doc = Jsoup.parse(input);

Elements messageEls = doc.select("s|message");

I see output as below:


 
  <s:message code="test" />

Somehow character < is converted <. How can I get original contect without enscape ? Actually I need find elements but because of escaping , it's not finding element ?

Mạnh Quyết Nguyễn · Accepted Answer

Jsoup escapes because not a standard HTML tag.

Try to use XML parser:

Document doc = Jsoup.parse(input, "", Parser.xmlParser());

Create a new XML parser. This parser assumes no knowledge of the incoming tags and does not treat it as HTML, rather creates a simple tree directly from the input.

How to unescape Jsoup document?

Answers (1)

Related Questions