Hamster
Hamster

Reputation: 3122

How do I assemble pieces of HTML into a DOMDocument?

It appears that loadHTML and loadHTMLFile for a files representing sections of an HTML document seem to fill in html and body tags for each section, as revealed when I output with the following:

$doc = new DOMDocument();
$doc->loadHTMLFile($file);
$elements = $doc->getElementsByTagName('*');

if( !is_null($elements) ) {
    foreach( $elements as $element ) {
        echo "<br/>". $element->nodeName. ": ";

        $nodes = $element->childNodes;
        foreach( $nodes as $node ) {
            echo $node->nodeValue. "\n";
        }
    }
}

Since I plan to assemble these parts into the larger document within my own code, and I've been instructed to use DOMDocument to do it, what can I do to prevent this behavior?

Upvotes: 1

Views: 416

Answers (2)

Gordon
Gordon

Reputation: 316969

This is part of several modifications the HTML parser module of libxml makes to the document in order to work with broken HTML. It only occurs when using loadHTML and loadHTMLFile on partial markup. If you know the partial is valid X(HT)ML, use load and loadXML instead.

You could use

$doc->saveXml($doc->getElementsByTagName('body')->item(0));

to dump the outerHTML of the body element, e.g. <body>anything else</body> and strip the body element with str_replace or extract the inner html with substr.

$html = '<p>I am a fragment</p>';
$dom = new DOMDocument;
$dom->loadHTML($html); // added html and body tags
echo substr(
    $dom->saveXml(
        $dom->getElementsByTagName('body')->item(0)
    ),
    6, -7
);
// <p>I am a fragment</p>

Note that this will use XHTML compliant markup, so <br> would become <br/>. As of PHP 5.3.5, there is no way to pass a node to saveHTML(). A bug request has been filed.

Upvotes: 1

Artefacto
Artefacto

Reputation: 97815

The closest you can get is to use the DOMDocumentFragment.

Then you can do:

$doc = new DOMDocument();
...
$f = $doc->createDocumentFragment();
$f->appendXML("<foo>text</foo><bar>text2</bar>"); 
$someElement->appendChild($f);

However, this expects XML, not HTML.

In any case, I think you're creating an artificial problem. Since you know the behavior is to create the html and body tags you can just extract the elements in the file from within the body tag and then import the, to the DOMDocument where you're assembling the final file. See DOMDocument::importNode.

Upvotes: 0

Related Questions