Reputation: 142
This is a problem I've had for a long time - I currently accept a full html page from the user as input and want to filter / clean it out. the problem with HTMLpurifier is that it removes the head , html , and body tags - as well as the styles in the head. I've google , looked at the forums , tried implementing what was written , and to no luck. Can someone help ?
What I want : To keep the HTML , HEAD , STYLE , BODY TAGS
What I have done :
$config->set('HTML.DefinitionID', 'test');
$config->set('HTML.DefinitionRev', 1);
$config->set('HTML.AllowedElements', array('html','head', 'body', 'style', 'div', 'p'));
if ($def = $config->maybeGetRawHTMLDefinition()) {
$def->addElement('html', 'Block', 'Inline', 'Common', array());
$def->addElement('head', 'Block', 'Inline', 'Common', array());
$def->addElement('style', 'Block', 'Inline', 'Common', array());
$def->addElement('body', 'Block', 'Inline', 'Common', array());
}
Upvotes: 4
Views: 1138
Reputation: 500
It requires some amount of work, but it is possible to implement this yourself.
All steps would be too much to explain here, but I've encountered the exact same problem. I wanted to sanitize HTML contents as whole documents, and had to find out the hard way, how the library works under the hood.
In short:
I've explained the approach for my use-case based on a shopware example in my blog: https://machinateur.dev/blog/how-to-sanitize-full-html-5-documents-with-htmlpurifier.
Upvotes: 0
Reputation: 142
End Result - HTMLPurfier does not natively allow full HTML Parsing - Either extend it or find a pass thru
Upvotes: 0
Reputation:
You need to
$config->set('Core.ConvertDocumentToFragment', false);
For whatever reason, Core.ConvertDocumentToFragment
defaults to true
, even though the documentation states that "for most inputs, this processing is not necessary".
I was bitten by this too. All I got from the error collector was the cryptic message "Removed document metadata tags", which in turn is a translation from the internal message "Lexer: Extracted body".
Upvotes: 0
Reputation: 7114
Why not use strip_tags? It supports list of allowed tags.
http://www.php.net/manual/en/function.strip-tags.php
Upvotes: 0