flaviu
flaviu

Reputation: 163

ignore nested results in preg_match_all

I want to make a match on the first occurance of characters inside two curling brackets, but ignore the ones inside it.

{{some text here {{nested text here}} another text {{another nested text here}} final text}}

So the result must be

some text here {{nested text here}} another text {{another nested text here}} final text

but this search

preg_match_all("^\{{(.*?)\}}^", $string, $results);

gives me the ones inside the first pair of brackets:

$results[0][0] = nested text here
$results[0][1] = another nested text here

Is there any way to achieve this with preg_match_all?

Upvotes: 0

Views: 285

Answers (1)

Martin Ender
Martin Ender

Reputation: 44259

Nested structures often cause problems with regular expressions (since they make the language to be matched more complex than regular). PCRE is one of the engines, that does allow matching of them, because it supports recursion. If you never have single curly brackets inside your double-brackets, you could use this pattern:

'/\{\{[^{}]*(?:(?R)[^{}]*)*\}\}/'

Where (?R) nests the whole pattern inside itself again.

I am not sure how well PCRE optimizes, but you can help a little, by making all repetitions possessive. That suppresses backtracking, which is not necessary here, since all consecutive repetitions are mutually exclusive:

'/\{\{[^{}]*+(?:(?R)[^{}]*+)*+ \}\}/'

If you do allow single brackets, you could do something similar with lookaheads, but this already shows why regular expressions aren't really made for nested structures (even if the engine supports it):

'/\{\{(?:(?!\{\{|\}\}).)*(?:(?R)(?:(?!\{\{|\}\}).)*)*\}\}/'

Now instead of non-{} characters, we allow the repetition of any character, unless it marks the beginning of a {{ or }}. Again, making it possessive might be a good idea:

'/\{\{(?:(?!\{\{|\}\}).)*+(?:(?R)(?:(?!\{\{|\}\}).)*+)*+\}\}/'

Upvotes: 3

Related Questions