Reputation: 2618
I have a HTML document that I want to remove specific tags from, identified by a specific class. The tags have multiple classes. A very simple example of the markup I have:
<style>.c{background-color:yellow}</style>
This is a <span class="a b c">string</span>.
This is <span class="a b c">another string</span>.
This is <span class="a b">yet another string</span>.
I want to be able to parse through that string (preferably using PHP's DOMDocument?), finding only <span>
tags with the class c
so the result will look something like this:
<style>.c{background-color:yellow}</style>
This is a string.
This is another string.
This is <span class="a b">yet another string</span>.
Basically, I want to remove the tags around the text, but preserve the text on the document.
Update: I think I'm close, but it doesn't work for me:
$test = '<style>.c {background-color:yellow;}</style>' .
'This is a <span class="a b c">string</span>.'.
'This is <span class="a b c">another string</span>.' .
'This is <span class="a b">yet another string</span>.';
$doc = new DOMDocument();
$doc->loadHTML($test);
$xpath = new DOMXPath($doc);
$query = "//span[contains(@class, 'c')]"; // thanks to Gordon
$oldnodes = $xpath->query($query);
foreach ($oldnodes as $oldnode) {
$txt = $oldnode->nodeValue;
$oldnode->parentNode->replaceChild($txt, $oldnode);
}
echo $doc->saveHTML();
Upvotes: 1
Views: 1931
Reputation: 165201
You're close... Create a fragment for the children:
$query = "//span[contains(concat(' ', normalize-space(@class), ' '), ' c ')]";
$oldnodes = $xpath->query($query);
foreach ($oldnodes as $node) {
$fragment = $doc->createDocumentFragment();
while($node->childNodes->length > 0) {
$fragment->appendChild($node->childNodes->item(0));
}
$node->parentNode->replaceChild($fragment, $node);
}
Since each iteration will remove the $node
, there's no need to iterate (it'll dynamically remove it from the result set since it's no longer valid)...
This will also handle the cases where you have more than just text inside the span:
<span class="a b c">foo <b>bar</b> baz</span>
Note the recent edit: I changed the xpath query to be more robust as now it will match only exact classes c
rather than toc
...
What's weird is that it allows you to remove in the iteration without affecting the results (I know it's done that before, I just don't know why here). But this is tested code and should be good.
Upvotes: 2