Reputation: 2046
How can I remove these:
<td> </td>
or
<td width="7%"> </td>
from my JSoup 'Document'? I've tried many methods, but these non-breaking space characters do not match anything with normal JSoup expressions or Selectors.
Upvotes: 7
Views: 8044
Reputation: 1108682
The HTML entity
(Unicode character NO-BREAK SPACE U+00A0) can in Java be represented by the character \u00a0
. Assuming that you want to remove every element which contains that character as own text (and thus not every line as you said in a comment), then the following ought to work:
document.select(":containsOwn(\u00a0)").remove();
If you really mean to remove the entire line then your best bet is really to scan HTML yourself line by line.
Upvotes: 15