Is this a bug in PHP's DOMDocument Library?

Question

I'm trying to parse some HTML with PHP, but there is an error. Here is the relevant code, which can be run on the command line ($ php script.php).

loadHTML(mb_convert_encoding($text, 'HTML-ENTITIES', 'UTF-8'), LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
    libxml_use_internal_errors($internalErrors);

    // Extract images from the dom
    $xpath = new DOMXPath($dom);

    // Other processing code removed for this example 

    $cleaned_html = $dom->saveHTML();
    return $cleaned_html;
}

$some_text = <<asdf
click here


another link
EOD;

print images_to_links($some_text);

Expected output:

asdf
click here


another link

Actual output -- notice how the blockquote has wrapped around the other elements:

asdfclick here
another link

Is there an error in my code or is this a bug with domdocument?

javier_domenech · Accepted Answer

LibXML requires a root node, so interprets the first element it finds as the root node (ignoring its closing tag).

Is this a bug in PHP's DOMDocument Library?

Answers (2)

Related Questions

Is this a bug in PHP&#39;s DOMDocument Library?

Answers (2)

Related Questions

Is this a bug in PHP's DOMDocument Library?