Reputation: 142

Allowing full html to be parsed in HTMLPurifier

This is a problem I've had for a long time - I currently accept a full html page from the user as input and want to filter / clean it out. the problem with HTMLpurifier is that it removes the head , html , and body tags - as well as the styles in the head. I've google , looked at the forums , tried implementing what was written , and to no luck. Can someone help ?

What I want : To keep the HTML , HEAD , STYLE , BODY TAGS

What I have done :

$config->set('HTML.DefinitionID', 'test');
    $config->set('HTML.DefinitionRev', 1);
    $config->set('HTML.AllowedElements', array('html','head', 'body', 'style', 'div', 'p'));    

    if ($def = $config->maybeGetRawHTMLDefinition()) {
        $def->addElement('html', 'Block', 'Inline', 'Common', array());
        $def->addElement('head', 'Block', 'Inline', 'Common', array());
        $def->addElement('style', 'Block', 'Inline', 'Common', array());
        $def->addElement('body', 'Block', 'Inline', 'Common', array());

    }

Upvotes: 4

Answers (4)

machinateur

Reputation: 500

It requires some amount of work, but it is possible to implement this yourself.

All steps would be too much to explain here, but I've encountered the exact same problem. I wanted to sanitize HTML contents as whole documents, and had to find out the hard way, how the library works under the hood.

In short:

Some settings have to be tweaked
Custom elements and attributes have to be added and configured

I've explained the approach for my use-case based on a shopware example in my blog: https://machinateur.dev/blog/how-to-sanitize-full-html-5-documents-with-htmlpurifier.

Upvotes: 0

Nadi Hassan Hassan

Reputation: 142

End Result - HTMLPurfier does not natively allow full HTML Parsing - Either extend it or find a pass thru

Upvotes: 0

user824425

Reputation:

You need to

$config->set('Core.ConvertDocumentToFragment', false);

For whatever reason, Core.ConvertDocumentToFragment defaults to true, even though the documentation states that "for most inputs, this processing is not necessary".

I was bitten by this too. All I got from the error collector was the cryptic message "Removed document metadata tags", which in turn is a translation from the internal message "Lexer: Extracted body".

Upvotes: 0

MilanG

Reputation: 7114

Why not use strip_tags? It supports list of allowed tags.

http://www.php.net/manual/en/function.strip-tags.php

Upvotes: 0

Allowing full html to be parsed in HTMLPurifier

Answers (4)

Related Questions