Aquarelle
Aquarelle

Reputation: 9138

How do I prevent Jsoup from removing 'href' attribute of anchor element?

I want to use Jsoup to cleanse input while still allowing anchor elements with an "href" attribute to remain untouched; however, I've found that no matter what I do, Jsoup.clean() removes the "href" attribute. Test code follows:

    public static void main(String[] args)
    {
        final String foo = "<a href='/foo/'>Foo</a>";
        final String cleansedOutput = Jsoup.clean(foo, Safelist.relaxed().addTags("a").addAttributes("a", "href"));

        System.out.println("foo: " + foo);
        System.out.println("cleansedOutput: " + cleansedOutput);
    }

The output of the code is as follows:

foo: <a href='/foo/'>Foo</a>
cleansedOutput: <a>Foo</a>

As you can see, the "href" attribute is stripped even when, as shown above, I explicitly tell Jsoup to preserve anchor elements and the "href" attribute (I initially used the default Safelist.relaxed() before adding addTags() and addAttributes(); they all removed the attribute regardless).

Am I doing something wrong? Or is this a bug in Jsoup? (It's hard to believe it's a bug, as their unit tests would have failed early on.)

Upvotes: 1

Views: 819

Answers (1)

user17022635
user17022635

Reputation:

From a documentation Jsoup.clean(java.lang.String,org.jsoup.safety.Safelist)

Note that as this method does not take a base href URL to resolve attributes with relative URLs against, those URLs will be removed, unless the input HTML contains a <base href> tag. If you wish to preserve those, use the clean(String html, String baseHref, Safelist) method instead, and enable Safelist.preserveRelativeLinks(boolean).

String html = "<a href='/foo/'>Foo</a>";
Safelist safelist = Safelist.relaxed();
safelist.preserveRelativeLinks(true);
String clean = Jsoup.clean(html, "http://", safelist);
System.out.println(clean);

Will print out

<a href="/foo/">Foo</a>

Upvotes: 4

Related Questions