Nick Betcher
Nick Betcher

Reputation: 2046

How can I remove non-breaking spaces from a JSoup 'Document'?

How can I remove these:

<td>&nbsp;</td>

or

<td width="7%">&nbsp;</td>

from my JSoup 'Document'? I've tried many methods, but these non-breaking space characters do not match anything with normal JSoup expressions or Selectors.

Upvotes: 7

Views: 8044

Answers (1)

BalusC
BalusC

Reputation: 1108682

The HTML entity &nbsp; (Unicode character NO-BREAK SPACE U+00A0) can in Java be represented by the character \u00a0. Assuming that you want to remove every element which contains that character as own text (and thus not every line as you said in a comment), then the following ought to work:

document.select(":containsOwn(\u00a0)").remove();

If you really mean to remove the entire line then your best bet is really to scan HTML yourself line by line.

Upvotes: 15

Related Questions