Reputation: 11
On a website, I am inserting a link with a hyperlink inside, like so:
For more information <a href="https://example.com/sample_doc.html">read the docs</a>.
HTML Purifier is filtering out the entire URL, so that I'm not able to insert these URLs.
Output of HTML Purifier (by using the demo website):
For more information <a>read the docs</a>.
Is there a way to change the config, allowing HTML Purifier to allow underscores in my URLs?
I read the documentation of HTML Purifier but couldn't find an answer to my question.
My current config (default) looks like this:
{
"Attr.AllowedFrameTargets": [
"_blank"
],
"Attr.EnableID": true,
"HTML.AllowedComments": [
"pagebreak"
],
"HTML.SafeIframe": true,
"URI.SafeIframeRegexp": "%^(https?:)?//(www.youtube.com/|player.vimeo.com/)%"
}
Upvotes: 0
Views: 30
Reputation: 6179
The underscore in your URL isn't the problem. The code snippet you're testing:
For more information <a href="https:/example.com/sample_doc.html">read the docs</a>.
...is missing a /
after https:/
. If you enter this:
For more information <a href="https://example.com/sample_doc.html">read the docs</a>.
...then HTML Purifier will leave it alone.
The configuration you've toggled, Core.AllowHostnameUnderscore
, allows URLs with an underscore in the hostname, like https://foo_bar.com/
. From the documentation:
By RFC 1123, underscores are not permitted in host names. (This is in contrast to the specification for DNS, RFC 2181, which allows underscores.) However, most browsers do the right thing when faced with an underscore in the host name, and so some poorly written websites are written with the expectation this should work. Setting this parameter to true relaxes our allowed character check so that underscores are permitted.
You shouldn't actually need this. If you're likely to have a lot of input that contains https:/
as opposed to https://
in its URLs, consider doing some preprocessing instead to replace these faulty URLs.
Upvotes: 0
Reputation: 11
I found the solution in the documentation of HTML Purifier: http://htmlpurifier.org/live/configdoc/plain.html#Core.AllowHostnameUnderscore
Adding the following code to the Default.json
config file allows me to add underscores to hostnames and URLs now
"Core.AllowHostnameUnderscore": true,
Upvotes: 1