Reputation: 2555
I have some HTML strings saved in PHP variables which contain strings like this
"Some random text <p> <span></span> </p> and the random text continues"
"<p>Some random</p> <p> <span></span> </p> and the <span> </span>"
How do i strip out the <p>
and <span>
tag which contain empty spaces? to something like this:
"Some random text and the random text continues"
"<p>Some random</p> and the "
Upvotes: 0
Views: 906
Reputation: 89557
You need to use a recursion:
$data = <<<'EOD'
Some random text <p> <span> </span> </p> and the random text continues
<p>Some random</p> <p> <span></span> </p> and the <span> </span>
EOD;
$pattern = '~<(p|span)>(?>\s+| |(?R))*</\1>~';
$result = preg_replace($pattern, '', $data);
echo $result;
Pattern details:
~ # pattern delimiter
<(p|span)> # the tagname is captured in the capture group 1
(?> # open an atomic group: all the content that must be ignored
\s+ # whitespaces
| # OR
#
| # OR
(?R) # recursion
)* # repeat the atomic group
</\1> # closing tag: with a backreference to the capture group 1
~
Using DOMDocument, you can do that:
function removeEmptyTags($html, $tags = false) {
$state = libxml_use_internal_errors(true);
$dom = new DOMDocument;
$dom->loadHTML("<div>$html</div>", LIBXML_HTML_NODEFDTD | LIBXML_HTML_NOIMPLIED);
$xp = new DOMXPath($dom);
$xp->registerNamespace('php', 'http://php.net/xpath');
$xp->registerPHPFunctions('isEmpty');
$predicate = '';
if ($tags)
$predicate = '[' . implode(' or ', array_map(function($i) {
return 'name()="' . $i . '"';
}, $tags)) . ']';
$nodeList = $xp->query('//*'. $predicate . '[php:functionString("isEmpty", .)]');
foreach ($nodeList as $node) {
$node->parentNode->removeChild($node);
}
$result = '';
foreach ($dom->documentElement->childNodes as $node) {
$result .= $dom->saveHTML($node);
}
return $result;
}
function isEmpty($txt) {
return preg_match('~^(?:\s+| )*$~iu', $txt) ? true : false;
}
echo removeEmptyTags($data, ['p', 'span']);
Upvotes: 3