Reputation: 3122
It appears that loadHTML
and loadHTMLFile
for a files representing sections of an HTML document seem to fill in html
and body
tags for each section, as revealed when I output with the following:
$doc = new DOMDocument();
$doc->loadHTMLFile($file);
$elements = $doc->getElementsByTagName('*');
if( !is_null($elements) ) {
foreach( $elements as $element ) {
echo "<br/>". $element->nodeName. ": ";
$nodes = $element->childNodes;
foreach( $nodes as $node ) {
echo $node->nodeValue. "\n";
}
}
}
Since I plan to assemble these parts into the larger document within my own code, and I've been instructed to use DOMDocument to do it, what can I do to prevent this behavior?
Upvotes: 1
Views: 416
Reputation: 316969
This is part of several modifications the HTML parser module of libxml makes to the document in order to work with broken HTML. It only occurs when using loadHTML
and loadHTMLFile
on partial markup. If you know the partial is valid X(HT)ML, use load
and loadXML
instead.
You could use
$doc->saveXml($doc->getElementsByTagName('body')->item(0));
to dump the outerHTML of the body element, e.g. <body>anything else</body>
and strip the body element with str_replace
or extract the inner html with substr
.
$html = '<p>I am a fragment</p>';
$dom = new DOMDocument;
$dom->loadHTML($html); // added html and body tags
echo substr(
$dom->saveXml(
$dom->getElementsByTagName('body')->item(0)
),
6, -7
);
// <p>I am a fragment</p>
Note that this will use XHTML compliant markup, so <br>
would become <br/>
. As of PHP 5.3.5, there is no way to pass a node to saveHTML()
. A bug request has been filed.
Upvotes: 1
Reputation: 97815
The closest you can get is to use the DOMDocumentFragment
.
Then you can do:
$doc = new DOMDocument();
...
$f = $doc->createDocumentFragment();
$f->appendXML("<foo>text</foo><bar>text2</bar>");
$someElement->appendChild($f);
However, this expects XML, not HTML.
In any case, I think you're creating an artificial problem. Since you know the behavior is to create the html
and body
tags you can just extract the elements in the file from within the body tag and then import the, to the DOMDocument where you're assembling the final file. See DOMDocument::importNode
.
Upvotes: 0