ajay_full_stack
ajay_full_stack

Reputation: 554

Removing a span with a specific class from HTML , but not the content using regular expression only

My php scripts creates following html.

<div>
    <hr class="target"/>
    Remove target class <span class="target"> only and save this text</span>
    <span class='target test1 test2 '> Remove target class with span tag not this text</span>
    <span class="target"> multi-line / multi-paragraph content</span>
    <span class='target'>content without space after span tag</span>
</div>

I want above html as follows using PHP regex expression only(as buiseness logic requirement).

<div>
    <hr/>
    Remove target class only and save this text
    Remove target class with span tag not this text
    multi-line / multi-paragraph content
    content without space after span tag
</div>

Note: (1) target class may wrap in single/double quotes. 4). a span with multiple classes

I used following regex in PHP.

$data = preg_replace('#<(\w+) class=["\']highlight["\']>(.*)<\/\1>#', '\2', $data);

It done most things but fails on following. 1) hr tag. 2) leaves extra space when it removes span tag. 3.) fails on multiline content.

Thanx in advance.

Upvotes: 1

Views: 765

Answers (1)

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89639

The way to do that is to use DOMDocument:

$html=<<<'EOD'
<div>
    <hr class="target"/>
    Remove target class <span class="target"> only and save this text</span>
    <span class='target test1 test2 '> Remove target class with span tag not this text</span>
    <span class="target"> multi-line / multi-paragraph content</span>
    <span class='target'>content without space after span tag</span>
</div>
EOD;

$dom = new DOMDocument;
$dom->loadHTML($html, LIBXML_HTML_NODEFDTD | LIBXML_HTML_NOIMPLIED);
$xp = new DOMXPath($dom);

// get the node list of span nodes with "target" class
$spanNodeList = $xp->query('//span[contains(@class, "target")]');

foreach ($spanNodeList as $spanNode) {
    $spanNode->parentNode->replaceChild($spanNode->firstChild, $spanNode);
}

// get the list of hr nodes
// (here I don't use XPath, but it can be done in the same way)
$hrNodes = $dom->getElementsByTagName('hr');

foreach ($hrNodes as $hrNode) {
    if ($hrNode->hasAttribute('class') && $hrNode->getAttribute('class') === 'target')
        $hrNode->removeAttribute('class');
}
echo $dom->saveHTML();

Upvotes: 1

Related Questions