Reputation: 47
As seen on the screenshot, jsoup inserts empty TextNode before/after every other child node. Is there a way to turn that off? Or do i do something wrong? Is there any kind of reason why those exist?
Jsoup 1.8.2 if it helps.
Upvotes: 1
Views: 324
Reputation: 43033
Is there a way to turn that off?
No, since it's the normal behavior of Jsoup HTML parser.
Or do i do something wrong?
The method invoked maybe the root cause of the whole issue.
Is there any kind of reason why those exist?
Each time a text is found in an HTML document, it is presented as a TextNode
in the final HTML object tree (Document
).
If the HTML document's author puts a line break before an element, Jsoup will create a TextNode for this line break.
(...) childnodes().size() no longer returns correct data
I think the heart of the problem is here. Instead of calling Node#childnodes()
, call the Element#children()
method. This method will filter out the empty TextNode
s for you.
Alternatively, you can use the following code snippet to get the data:
Document doc = ...
Elements tds = doc.select("table tr > td");
if (tds.size() == 0) {
// No td found ...
}
// ...
Jsoup 1.8.2
Upvotes: 1