Reputation: 9858
I am using HTMLDom to manipulate a string, rather than a complete webpage. When I use saveHTML()
it automatically throws in doctype
and html
tags.
$str = 'frament containing html';
$str = utf8_encode($str);
$doc->LoadHTML($str);
...do stuff...
$str = $doc->saveHTML();
What is the correct way to save a fragment of HTML without the automatic inclusion of extra tags. Failing that; the correct method to remove these extra tags?
I used an html parser to avoid using regex's, so it seems a little counter-intuitive to have to use them on the output of a parser.
Upvotes: 1
Views: 1352
Reputation: 19512
PHPs DOMDocument repairs the document if you load HTML. That means it adds the html
and body
elements.
So you need to fetch all nodes inside body
and save them as HTML.
$html = <<<'HTML'
<h1>Hello World</h1>
Text
<!-- comment -->
HTML;
$dom = new DOMDocument();
$dom->loadHtml($html);
$xpath = new DOMXPath($dom);
$result = '';
foreach ($xpath->evaluate('/html/body/node()') as $node) {
$result .= $dom->saveHtml($node);
}
echo $result;
Here is another option, but it is not available everywhere yet. PHP added LIBXML_HTML_NOIMPLIED
and LIBXML_HTML_NODEFDTD
options.
$dom->loadHtml($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
The first an best option would be to update the PHP. PHP 5.3 is no longer maintained.
The second option is using DOMDocument::saveXML($node, LIBXML_NOEMPTYTAG). This will generate an XML (XHTML) fragment, but should be enough for the most cases.
The last option would be using the string functions.
Upvotes: 2