Rene Jorgensen
Rene Jorgensen

Reputation: 179

Matching range without one character with regex

I'd like to create a regex pattern that captures everything within a selfclosing html tag in a string, it is to be used in a php preg_replace that removes all selfclosing tags (that are normally not selfclosing, i.e. div, span etc.) from a html dom string.

Here's an example. In the string:

'<div id="someId><div class="someClass" /></div>'

I would like to get the match:

'<div class="someClass" />'

But I keep getting no match at all or this match:

'<div id="someId><div class="someClass" />'

I have tried the following regex patterns and various combinations of them

A simple regex pattern with the dot wildcard and excluding ">":

~<div.*?[^>].*?.*?/>~

A negative lookahead regex:

~<div(?!.*?>.*?)/>~

A negative lookbehind regex:

~<div.*?(?<!>).*?/>~

What am I missing?

Upvotes: 0

Views: 126

Answers (3)

Huntro
Huntro

Reputation: 322

Use following regex:

<div[^<]*\/>

This regex just checks that there is no < inside the self-closing tag. This will be a problem if < is used inside the tag (eg. in a string).

To excluce < inside a string:

<div(?:[^<]*["'][^"']*["'][^<]*)\/>

Upvotes: 0

Rene Jorgensen
Rene Jorgensen

Reputation: 179

Seems I unnecessarily complicated the answer:

For my example this will yield the correct result:

~<div[^>]+?/>~

'div' can be replaced by a capture group to include additional tags if needed

Upvotes: 0

Jan
Jan

Reputation: 43169

Use a parser approach instead:

<?php

$html = <<<DATA
<div id="someId">
    <div class="someClass" />
</div>
DATA;

$dom = new DOMDocument;
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

$xpath = new DomXPath($dom);

$divs = $xpath->query("//div[@class='someClass']");
foreach ($divs as $div) {
    // do sth. useful here
}

?>

This sets up the DOM and looks for the div in question (via an xpath expression).

Upvotes: 1

Related Questions