Nadi Hassan Hassan
Nadi Hassan Hassan

Reputation: 142

Allowing full html to be parsed in HTMLPurifier

This is a problem I've had for a long time - I currently accept a full html page from the user as input and want to filter / clean it out. the problem with HTMLpurifier is that it removes the head , html , and body tags - as well as the styles in the head. I've google , looked at the forums , tried implementing what was written , and to no luck. Can someone help ?

What I want : To keep the HTML , HEAD , STYLE , BODY TAGS

What I have done :

$config->set('HTML.DefinitionID', 'test');
    $config->set('HTML.DefinitionRev', 1);
    $config->set('HTML.AllowedElements', array('html','head', 'body', 'style', 'div', 'p'));    

    if ($def = $config->maybeGetRawHTMLDefinition()) {
        $def->addElement('html', 'Block', 'Inline', 'Common', array());
        $def->addElement('head', 'Block', 'Inline', 'Common', array());
        $def->addElement('style', 'Block', 'Inline', 'Common', array());
        $def->addElement('body', 'Block', 'Inline', 'Common', array());

    }

Upvotes: 4

Views: 1138

Answers (4)

machinateur
machinateur

Reputation: 500

It requires some amount of work, but it is possible to implement this yourself.

All steps would be too much to explain here, but I've encountered the exact same problem. I wanted to sanitize HTML contents as whole documents, and had to find out the hard way, how the library works under the hood.

In short:

  • Some settings have to be tweaked
  • Custom elements and attributes have to be added and configured

I've explained the approach for my use-case based on a shopware example in my blog: https://machinateur.dev/blog/how-to-sanitize-full-html-5-documents-with-htmlpurifier.

Upvotes: 0

Nadi Hassan Hassan
Nadi Hassan Hassan

Reputation: 142

End Result - HTMLPurfier does not natively allow full HTML Parsing - Either extend it or find a pass thru

Upvotes: 0

user824425
user824425

Reputation:

You need to

$config->set('Core.ConvertDocumentToFragment', false);

For whatever reason, Core.ConvertDocumentToFragment defaults to true, even though the documentation states that "for most inputs, this processing is not necessary".

I was bitten by this too. All I got from the error collector was the cryptic message "Removed document metadata tags", which in turn is a translation from the internal message "Lexer: Extracted body".

Upvotes: 0

MilanG
MilanG

Reputation: 7114

Why not use strip_tags? It supports list of allowed tags.

http://www.php.net/manual/en/function.strip-tags.php

Upvotes: 0

Related Questions