Runnick
Runnick

Reputation: 715

PHP domDocument works incorrectly when the node wrapper in figure?

I'm trying to add some HTML to all links that contain image.

Basic HTML loaded into dom looks like

<div class='content'>
    <a href="..."><img src=""></a>

    <figure>
       <a href="..."><img src=""></a>
       <figcaption>Caption</figcaption>
    </figure>
</div>

The code:

$content = mb_convert_encoding($content, 'HTML-ENTITIES', "UTF-8");
$dom = new DOMDocument();
@$dom->loadHTML($content);

// Convert Images
$images = [];

foreach ($dom->getElementsByTagName('img') as $node) {
    $images[] = $node;
}

foreach ($images as $node) {    
     $field_html = $dom->createDocumentFragment(); // create fragment
     $field_html->appendXML('<span>11</span>'); // create fragment
     $node->parentNode->appendChild($field_html);  

}

$newHtml = preg_replace('/^<!DOCTYPE.+?>/', '', str_replace( array('<html>', '</html>', '<body>', '</body>'), array('', '', '', ''), $dom->saveHTML()));
return $newHtml; 

So when it's a regular link with img, it produces correct output:

<a href="..."><img src=""><span>11</span></a>

But when it's a figure, output is very strange — link is duplicated and inserted into figcaption:

<figure>
    <a href="..."><img src=""></a>
    <figcaption>Caption <a href="..."><span>11</span>
    </figcaption>
</figure>

Is that because DOMDocument doesn't understand figure thing?

Upvotes: 0

Views: 247

Answers (1)

miken32
miken32

Reputation: 42695

I was unable to reproduce your problem. My guess would be a misplaced element somewhere in your source HTML. But your code can be simplified quite a bit.

There's no need to put your image nodes into an array, you can work directly with the results of DomDocument::getElementsByTagName().

As mentioned in comments you can setup DomDocument::loadHTML() not to add the doctype and implied elements, instead of removing them later with potentially tricky string manipulations.

A simple DomDocument::createElement() can be used for the element you want to append, instead of creating a new object.

Finally, the error control operator @ should generally be avoided. Instead, libxml_use_internal_errors() can be used to set the error behaviour. This allows you to examine error messages with libxml_get_errors() if desired.

$content = <<< HTML
<div class="content">
    <a href="..."><img src=""></a>
    <figure>
       <a href="..."><img src=""></a>
       <figcaption>Caption</figcaption>
    </figure>
</div>
HTML;

$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTML($content, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
libxml_use_internal_errors(false);

foreach ($dom->getElementsByTagName('img') as $node) {
     $node->parentNode->appendChild($dom->createElement("span", "11"));
}

$newHtml = $dom->saveHTML();
echo $newHtml;

Output:

<div class="content">
    <a href="..."><img src=""><span>11</span></a>
    <figure>
       <a href="..."><img src=""><span>11</span></a>
       <figcaption>Caption</figcaption>
    </figure>
</div>

Upvotes: 1

Related Questions