offline
offline

Reputation: 1619

How to remove specific dom element with PHP DOMDocument

I have this html in my database:

<p>some text 1</p>
<img src=\"http://www.example.com/images/some_image_1.jpg\">
<p>some text 2</p>
<p>some text 3</p>
<img src=\"http://www.example.com/images/some_image_2.jpg\">
<p>some text 4</p>
<p>some text 5</p>
<img src=\"http://www.example.com/images/some_image_3.jpg\">

Conditionally, I need to remove some specific <img> tag. So I don't want to remove all <img> tags, but only specific ones.

I have tried this, but it will remove all <img> tags, even if I do not want that:

$dom = new \DOMDocument;
$dom->preserveWhiteSpace = false;
$dom->loadHTML($html);

$nodes = $dom->getElementsByTagName("img");

for($i = 0; $i < $nodes->length; $i++) {
    if ($i == 1) {
        continue;
    }
    $image = $nodes->item($i);
    $image->parentNode->removeChild($image);
}

return $dom->saveHTML();

Can someone help me with this ? In this html example, let's say that I want to remove first and third image in text, but to leave second one.

Also, I have noticed that saveHTML() method is adding <html><body> tags to my html, and I do not want that. I don't see any option to turn this off. Any help there too ?

Thanks in advance, I'm stuck with this for hours.

Upvotes: 1

Views: 4581

Answers (3)

themullet
themullet

Reputation: 863

The above ones weren't working for me. From comments in the documentation, it mentions

You can't remove DOMNodes from a DOMNodeList as you're iterating over them in a foreach loop

https://www.php.net/manual/en/domnode.removechild

$domNodeList = $domDocument->getElementsByTagname('p');
    
$domElemsToRemove = array();

foreach ( $domNodeList as $domElement ) {

  // ...do stuff with $domElement...

  $domElemsToRemove[] = $domElement;

}

foreach( $domElemsToRemove as $domElement ){

  $domElement->parentNode->removeChild($domElement);

}

That style worked for me for removing some tags

Upvotes: 0

Saeed.Gh
Saeed.Gh

Reputation: 1305

there are option to avoid adding html and body tag when you want to load an html file or content:

$dom = new DOMDocument;
$dom->preserveWhiteSpace = false;
@$dom->loadHTML(file_get_contents('file.html'), LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
//@$dom->loadHTMLFile('file.html'); //Adds Html and body tags if not exist at the beginning

$nodes = $dom->getElementsByTagName("img");

foreach($nodes as $i => $node){
    if ($i == 1) {
        continue;
    }
    $image = $nodes->item($i);
    $image->parentNode->removeChild($image);
}

return $dom->saveHTML();
//$dom->saveHtmlFile('file.html');

some answers close to your question's answer which used in this answer:

  1. To delete element(you already used): https://stackoverflow.com/a/15272752/3086860
  2. To avoid putting extra tags: https://stackoverflow.com/a/22490902/3086860

Upvotes: 1

Tanveer Hussain
Tanveer Hussain

Reputation: 123

You can do this by using array. I modified your code this will not remove second img tag.

$dom = new \DOMDocument;
$dom->preserveWhiteSpace = false;
$dom->loadHTML($html);

// Declare array with numeric vlaues
$remainImages = array(1);

$nodes = $dom->getElementsByTagName("img");

  for($i = 0; $i < $nodes->length; $i++) {
    if (!in_array($i,$remainImages) {
        $image = $nodes->item($i);
        $image->parentNode->removeChild($image);
     }  
}

return $dom->saveHTML();

Upvotes: 1

Related Questions