Reputation: 3
In a project I have a text with patterns like that:
{| text {| text |} text |}
more text
I want to get the first part with brackets. For this I use preg_match recursively. The following code works fine already:
preg_match('/\{((?>[^\{\}]+)|(?R))*\}/x',$text,$matches);
But if I add the symbol "|", I got an empty result and I don't know why:
preg_match('/\{\|((?>[^\{\}]+)|(?R))*\|\}/x',$text,$matches);
I can't use the first solution because in the text something like { text } can also exist. Can somebody tell me what I do wrong here? Thx
Upvotes: 0
Views: 2694
Reputation: 77420
You've got a few suggestions for working regular expressions, but if you're wondering why your original regexp failed, read on. The problem lies when it comes time to match a closing "|}" tag. The (?>[^{}]+)
(or [^{}]++
) sub expression will match the "|", causing the |}
sub expression to fail. With no backtracking in the sub expression, there's no way to recover from the failed match.
Upvotes: 1
Reputation: 75242
Try this:
'/(?s)\{\|(?:(?:(?!\{\||\|\}).)++|(?R))*\|\}/'
In your original regex you use the character class [^{}]
to match anything except a delimiter. That's fine when the delimiters are only one character, but yours are two characters. To not-match a multi-character sequence you need something this:
(?:(?!\{\||\|\}).)++
The dot matches any character (including newlines, thank to the (?s)
), but only after the lookahead has determined that it's not part of a {|
or |}
sequence. I also dropped your atomic group ((?>...)
) and replaced it with a possessive quantifier (++
) to reduce clutter. But you should definitely use one or the other in that part of the regex to prevent catastrophic backtracking.
Upvotes: 3
Reputation: 33769
See PHP - help with my REGEX-based recursive function
To adapt it to your use
preg_match_all('/\{\|(?:^(\{\||\|\})|(?R))*\|\}/', $text, $matches);
Upvotes: 0