vikas
vikas

Reputation: 61

HTMLPurifier Library removes all custom tags

I am using 4.0.0 version of htmlpurifier library and all my request are getting purified by the library. Some times we need to allow some arbitrary custom tags and xml tags that are not part of standard definition. Here the library removes all the non supported ones.

I know we can write definitions to support custom attributes and tags, but my problem is to support any custom tags and not allow only few tags like script, iframe etc.

Is there a way to achieve this in the library?

Upvotes: 1

Views: 1068

Answers (1)

pinkgothic
pinkgothic

Reputation: 6179

Short answer

No.

Long answer

Generally speaking, HTML Purifier's principle is that of a whitelist. This means that it must know about all legal constellations and will discard anything that it doesn't recognise as such.

Even if you use HTML.ForbiddenElements to put HTML Purifier into a blacklist mode, that's a secondary design feature. HTML Purifier still insists that it knows of all elements and attributes that it's fed and will strip anything else.

Why the restriction?

A vivid example of why not to work with a fundamental blacklist approach is taking a look at the vast quantities of elements and attributes that have been added to the HTML specification in HTML5.

Consider the HTML5 Security Cheatsheet. If you'd implemented a blacklist approach before HTML5 was supported by browsers, you might not have realised that:

You see the problem. This is why HTML Purifier doesn't allow you to trust arbitrary custom tags.

What to do

I would recommend teaching HTML Purifier all of your custom tags and attributes. If they are not fully arbitrary, this approach can really help. I had to code many custom Outlook tags and attributes into HTML Purifier for a project once - while development was tedious, the net gain (robust security) was worth it.

If you do decide to forge that path, take a look at the "Enduser: Customize" documentation.

The example on the page tries to implement <form>, which is not natively supported by HTML Purifier. From a usecase perspective, it's not technically a custom element, but it illustrates the process well enough:

Juicy! With just this, we can answer four of our five questions:

  1. What is the element's name? form
  2. What content set does this element belong to? Block (this needs a little sleuthing, I find the easiest way is to search the DTD for FORM and determine which set it is in.)
  3. What are the allowed children of this element? One or more flow elements, but no nested forms
  4. What attributes does the element allow that are general? Common
  5. What attributes does the element allow that are specific to this element? A whole bunch, see ATTLIST; we're going to do the vital ones: action, method and name

Time for some code:

$config = HTMLPurifier_Config::createDefault();
$config->set('HTML.DefinitionID', 'enduser-customize.html tutorial');
$config->set('HTML.DefinitionRev', 1);
$config->set('Cache.DefinitionImpl', null); // remove this later!
$def = $config->getHTMLDefinition(true);
$def->addAttribute(
    'a',
    'target',
    new HTMLPurifier_AttrDef_Enum( array('_blank','_self','_target','_top') )
);
$form = $def->addElement(
    'form',   // name
    'Block',  // content set
    'Flow', // allowed children
    'Common', // attribute collection
    array( // attributes
        'action*' => 'URI',
        'method' => 'Enum#get|post',
        'name' => 'ID'
    )
 );
 $form->excludes = array('form' => true);

Each of the parameters corresponds to one of the questions we asked. Notice that we added an asterisk to the end of the action attribute to indicate that it is required. If someone specifies a form without that attribute, the tag will be axed. Also, the extra line at the end is a special extra declaration that prevents forms from being nested within each other.

Upvotes: 2

Related Questions