Reputation: 775
I was trying using this code to clean my text entirely from HTML elements:
Jsoup.clean(preparedText, Whitelist.none())
Unfortunately it didn't remove the
elements. I thought that it will replace it with a whitespace, the same way as it replace the ·
with a middle dot ("·").
Should I use another method in order to achieve this functionality?
Upvotes: 10
Views: 4838
Reputation: 11712
From the Jsoup docs:
Whitelists define what HTML (elements and attributes) to allow through the cleaner. Everything else is removed.
So the whitelist are concerned only with tags and attributes.
is neither a tag nor an attribute. It is simply the html encoding for a special character. If you want to translate from the encoding to normal text you may use for example the excellent apache commons lang library or use the Jsoup unescapeEntities method:
System.out.println(Parser.unescapeEntities(doc.toString(), false));
Addendum:
The translation from ·
to "·" already happens when you parse the html. It does not seem to have to do with the clean method.
Upvotes: 5