Reputation: 13
I am using jsoup
to clean some html.
I am using Whitelist.relaxed()
to clean. This works well for the most part and I would like to continue to use it.
Problem is that I have a place holder href
value that the clean is removing.
For example, <a href="{placeholder}">text</a>
. This is changed to <a>text</a>
. Is there a way to preserve the href attribute
with my place holder
value?
Thanks in advance
Upvotes: 1
Views: 1544
Reputation: 11
If you have only href attribute, you can use "preserveRelativeLinks(true)" . But you have already target = "_blank" or different attributes, method see all this attributes one url. So I preferred WhiteList's "addAttributes(String tag, String... attributes)" WhiteList addAttributes
Code like this :
WhiteList whiteList = WhiteList.none();
whitelist.addAttributes("a","href","target");
whitelist.addAttributes("img","src");
String cleanText = Jsoup.clean(htmlText, whitelist);
Upvotes: 1
Reputation: 11712
I guess you do not give a valid base URI to the clean
method. If you do that, then you can keep the href
s. If you also specify preserveRelativeLinks(true)
with the Whitelist, the links can be relative as well.
So when cleaning do something like this:
String html = "<a href=\"{placeholder}\">text</a>";
String cleaned = Jsoup.clean(html,
"http://base.uri",
Whitelist.relaxed().preserveRelativeLinks(true));
System.out.println(cleaned);
This will result in the following output:
<a href="{placeholder}">text</a>
Upvotes: 1