Reputation: 179
I'd like to create a regex pattern that captures everything within a selfclosing html tag in a string, it is to be used in a php preg_replace that removes all selfclosing tags (that are normally not selfclosing, i.e. div, span etc.) from a html dom string.
Here's an example. In the string:
'<div id="someId><div class="someClass" /></div>'
I would like to get the match:
'<div class="someClass" />'
But I keep getting no match at all or this match:
'<div id="someId><div class="someClass" />'
I have tried the following regex patterns and various combinations of them
A simple regex pattern with the dot wildcard and excluding ">":
~<div.*?[^>].*?.*?/>~
A negative lookahead regex:
~<div(?!.*?>.*?)/>~
A negative lookbehind regex:
~<div.*?(?<!>).*?/>~
What am I missing?
Upvotes: 0
Views: 126
Reputation: 322
Use following regex:
<div[^<]*\/>
This regex just checks that there is no <
inside the self-closing tag. This will be a problem if <
is used inside the tag (eg. in a string).
To excluce <
inside a string:
<div(?:[^<]*["'][^"']*["'][^<]*)\/>
Upvotes: 0
Reputation: 179
Seems I unnecessarily complicated the answer:
For my example this will yield the correct result:
~<div[^>]+?/>~
'div' can be replaced by a capture group to include additional tags if needed
Upvotes: 0
Reputation: 43169
Use a parser approach instead:
<?php
$html = <<<DATA
<div id="someId">
<div class="someClass" />
</div>
DATA;
$dom = new DOMDocument;
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DomXPath($dom);
$divs = $xpath->query("//div[@class='someClass']");
foreach ($divs as $div) {
// do sth. useful here
}
?>
This sets up the DOM
and looks for the div in question (via an xpath expression).
Upvotes: 1