Reputation: 113
I'm using Jsoup for parsing short html document that contains some custom tags needed for some logic operations on the result
Like this:
<table><showif field="xxx"><tr><td>test</test></td></tr></showif><tr><td>xyz</td></tr></table>
Document doc = Jsoup.parse(html);
Elements showif_fields = doc.select("SHOWIF[field]");
in this case the inner content seems lost, the outerHtml() method shows just this:
<showif value="xxx"></showif>
but if the "showif" tag contains a simple text like hello, it works as expected.
Any ideas? Thank you.
Upvotes: 0
Views: 1537
Reputation: 10522
The issue you are bumping into is that the HTML spec for table content is pretty strict, and so your unknown tags are getting fostered outside of the table. (Jsoup does this to match the HTML spec, so that it matches browser behaviour as closely as possible.)
In this case, you know what you're doing and you're creating the HTML, so you can set jsoup to ignore the HTML spec and just process the tags as it sees them. Do this with the XML parser:
Document doc = Jsoup.parse(html, baseUri, Parser.xmlParser());
Upvotes: 2
Reputation: 3527
The problem is that Jsoup has "sanitized" your HTML. As a quick test, I pasted your HTML into a page and view it with my browser (which tend to sanitize it either) it tells me the HTML actually looks like:
<showif value="xxx"/>
<table><tbody><tr><td>test</td></tr><tr><td>xyz</td></tr></tbody></table>
That is because only a few elements are allowed directly inside a <table>
, and the browser thinks you have made a mistake by placing a <showif>
tag inside, and fixes this for you. I think Jsoup does something similar.
(Edit: got Jsoup running now, and indeed it creates a similar output if I look at doc.outerHtml()
)
If you really need to use nonstandard things to annotate your pages, you might have better luck with nonstandard attributes, like:
<table>
<tr showif="xxx"><td>test</test></td></tr>
<tr><td>xyz</td></tr>
</table>
then you can say: Elements showif_fields = doc.select("*[showif]");
. This creates
<tr showif="xxx">
<td>test</td>
</tr>
as showif_fields.outerHtml()
.
Then you are maybe better off with a different approach, e.g. a javascript template engine like jQuery template or Mustache (and many others), which insert the generated HTML after doing some logic, instead of having the content loaded on the page and fixing them up afterwards. This of course depends on your requirements about which I do not know enough to judge if this recommendation makes any sense. Edit: no, it makes no sense, as Jsoup is server side, see comment below.
Upvotes: 0