Remove a DomNode with a certain class in PHP

Question

I have a HTML document (string) which contains a div with the class "foo":



  ...


Blabla

   Text


   GARBAGE

I only would like to remove all divs with class of "foo" and this is what I have so far:

$doc = new DOMDocument();
$doc->loadHTML($myhtml);
$xpath = new DOMXpath($doc);
$all = $xpath->query("/html");

$result = remove_elements_with_class('foo', $all);

How does the remove_elements_with_class function look like?

nickb · Accepted Answer

After:

$xpath = new DOMXpath($doc);

You need to:

Select all the nodes that you want to remove
Call DOMNode::removeChild() on those nodes

So, to accomplish the first task, you can issue an XPath query that finds all of the

nodes whose class is foo. That query would look like:

//div[contains(concat(' ', @class, ' '), ' foo ')]

Note that this handles the cases where an element can have more than one class, i.e. foo bar baz and baz foo bar. If this is undesirable, and you only want to match the class exactly (so now only a class with exactly foo will match), the query becomes:

//div[@class = 'foo']

And, in PHP, this becomes:

$nodes = $xpath->query( "//div[contains(concat(' ', @class, ' '), ' foo ')]");

From here, you have all the nodes you want to remove in $nodes, so just iterate over them, and remove them from the document by grabbing the

's parent node, and removing its child node:

foreach( $nodes as $node) {
    $node->parentNode->removeChild( $node);
}

That's all it takes! You can see it working in this demo.

Edit: To keep the

and just remove the contents, set the node's nodeValue attribute to an empty string:

foreach( $nodes as $node) {
    $node->nodeValue = '';
}

You can see it working in this updated demo. You could also replace the

with a newly created

, as that approach seems more bulletproof, but this should work for your use-case.

Remove a DomNode with a certain class in PHP

Answers (1)

Related Questions