psithu
psithu

Reputation: 21

How to allow curly braces in href attributes inside content that is processed with HTML Purifier

I have a redactor type field in my CMS (I use Craft CMS), where the user can enter some "variables" like so:

"Hello, {name}"

The only problem is that, when HTML purifier is enabled, it strips all such "variables" that are in href attributes and replaces them with a code. For example:

<a href="tel:{client tel}">{client tel}</a>

becomes

<a href="tel:7207">{client tel}</a>

I can of course disable HTML purifier, but I would rather not. I'm just having difficulty in finding the correct purifier config for allowing the desired behavior. Can anybody help with this?

Upvotes: 2

Views: 1171

Answers (1)

Rob Ruchte
Rob Ruchte

Reputation: 3707

This specific example is the result of two filters being applied in series. The first is percent-encoding the "path" portion of the value - everything after the tel scheme in the attribute value, resulting in tel:%7Bclient%20tel%7D. The second is a filter specific to tel: URL schemes that, according to the comments, "deletes all non-numeric characters, non-x characters from phone number, EXCEPT for a leading plus sign." - which leaves you with tel:7207.

From HTMLPurifier_URIScheme_tel->doValidate

// Delete all non-numeric characters, non-x characters
// from phone number, EXCEPT for a leading plus sign.
$uri->path = preg_replace('/(?!^\+)[^\dx]/', '',
    // Normalize e(x)tension to lower-case
    str_replace('X', 'x', $uri->path));

So this is really two problems, the first is the URL encoding of your braces, the second is the regex in the tel: schemes.

The easy way to solve this problem is to instruct HTMLPurifier to evaluate the href attribute on a tags as text rather then a URI. The URI evaluation is very strict, as it should be. Since you need to pass invalid URIs through the filter, you can either use the default text filtering, or create your own filter specific to your needs. I'll describe the former here, the latter is a more involved exercise.

Be aware that this will cause HTMLPurifier to evaluate all a href attributes as text, you will lose the strict validation on all links - make sure you understand the potential impacts on security in your application.

$config = HTMLPurifier_Config::createDefault();
$config->set('HTML.DefinitionID', 'trusted');

if ($def = $config->maybeGetRawHTMLDefinition())
{
    $def->addAttribute('a', 'href', 'Text');
}

$purifier = new HTMLPurifier($config);

See the customizing docs for more details.

The main danger here is that you are removing the filtering of the javascript: scheme. The Text filter will escape script tags, but it will not filter inline commands in the scheme.

Input:

<a href="<script>alert(1)</script>">Script tag in href with alert</a>
<a href="javascript:alert(1)">javascript scheme with alert</a>

Default escaping:

<a href="">Script tag in href with alert</a>
<a>javascript scheme with alert</a>

Text escaping:

<a href="&lt;script&gt;alert(1)&lt;/script&gt;">Script tag in href with alert</a>
<a href="javascript:alert(1)">javascript scheme with alert</a>

When I do things like this, I use two different definitions, one called "trusted" that filters content from trusted sources like CMS admins who should know what they're doing, and one called "paranoid", that is used for content from untrusted sources.

Another strategy to mitigate the risk here would be to allow this permissive escaping when entering content into the CMS (trusted definition), then apply the more strict filtering after the content has been rendered (paranoid definition). It's a good practice to escape on output anyway, to prevent stored xss attacks.

Upvotes: 1

Related Questions