NekoiNemo
NekoiNemo

Reputation: 47

Jsoup inserts empty TextNodes

As seen on the screenshot, jsoup inserts empty TextNode before/after every other child node. Is there a way to turn that off? Or do i do something wrong? Is there any kind of reason why those exist?

Jsoup 1.8.2 if it helps.

Example

Upvotes: 1

Views: 324

Answers (1)

Stephan
Stephan

Reputation: 43033

Is there a way to turn that off?

No, since it's the normal behavior of Jsoup HTML parser.

Or do i do something wrong?

The method invoked maybe the root cause of the whole issue.

Is there any kind of reason why those exist?

Each time a text is found in an HTML document, it is presented as a TextNode in the final HTML object tree (Document).

If the HTML document's author puts a line break before an element, Jsoup will create a TextNode for this line break.

(...) childnodes().size() no longer returns correct data

I think the heart of the problem is here. Instead of calling Node#childnodes(), call the Element#children() method. This method will filter out the empty TextNodes for you.

Alternatively, you can use the following code snippet to get the data:

Document doc = ...
Elements tds = doc.select("table tr > td");
if (tds.size() == 0) {
   // No td found ...
}

// ...

Jsoup 1.8.2

Upvotes: 1

Related Questions