user6005335
user6005335

Reputation: 13

Jsoup is removing the href attribute with a placeholder value

I am using jsoup to clean some html. I am using Whitelist.relaxed() to clean. This works well for the most part and I would like to continue to use it.

Problem is that I have a place holder href value that the clean is removing.

For example, <a href="{placeholder}">text</a>. This is changed to <a>text</a>. Is there a way to preserve the href attribute with my place holder value?

Thanks in advance

Upvotes: 1

Views: 1544

Answers (2)

cllcagri
cllcagri

Reputation: 11

If you have only href attribute, you can use "preserveRelativeLinks(true)" . But you have already target = "_blank" or different attributes, method see all this attributes one url. So I preferred WhiteList's "addAttributes(String tag, String... attributes)" WhiteList addAttributes

Code like this :

WhiteList whiteList = WhiteList.none();
whitelist.addAttributes("a","href","target");
whitelist.addAttributes("img","src");

String cleanText = Jsoup.clean(htmlText, whitelist);

Upvotes: 1

luksch
luksch

Reputation: 11712

I guess you do not give a valid base URI to the clean method. If you do that, then you can keep the hrefs. If you also specify preserveRelativeLinks(true) with the Whitelist, the links can be relative as well.

So when cleaning do something like this:

String html = "<a href=\"{placeholder}\">text</a>";
String cleaned = Jsoup.clean(html, 
                             "http://base.uri",
                             Whitelist.relaxed().preserveRelativeLinks(true));
System.out.println(cleaned);

This will result in the following output:

<a href="{placeholder}">text</a>

Upvotes: 1

Related Questions