Reputation: 9941
I have the following text
$text = 'This is a test to see if something(try_(this(once))) works';
I need to get something(try_(this(once)))
with regex from the text. I have the following issue
My nesting will not remain constant, my text can be
something(try_(this(once)))
orsomething(try_this(once))
orsomething(try_thisonce)
I have tried a number of regex found across the site, but cannot get it working. Here is the closest I have come
$text = 'This is a test to see if something(try_(this(once))) works';
$output = preg_match_all('/(\(([^()]|(?R))*\))/', $text, $out);
?><pre><?php var_dump($out[0]); ?></pre><?php
This outputs
array(1) {
[0]=>
string(18) "(try_(this(once)))"
}
No matter where I add the word something
(for example '/something(\(([^()]|(?R))*\))/'
and '/(\something(([^()]|(?R))*\))/'
), I get an empty array or NULL
$text2 = 'This is a test to see if something(try_(this(once))) works';
$output2 = preg_match_all('/something\((.*?)\)/', $text2, $out2);
?><pre><?php var_dump($out2[0]); ?></pre><?php
With this code I do get the word something
back,
array(1) {
[0]=>
string(25) "something(try_(this(once)"
}
but then the expression stops and return after the first closing )
which is expected as this is not a recursive expression
How do I recursively match and return a nested parenthesis with the word something
before the first opening (
, and if possible, what happens then there might or might not be a whitespace before the word something
, for example
something(try_(this(once)))
orsomething (try_(this(once)))
Upvotes: 5
Views: 188
Reputation: 89547
(?R)
isn't a magical incantation to obtain a pattern able to handle balanced things (like parenthesis for example). (?R)
is the same thing than (?0)
, it is an alias for "the capture group zero", in other words, the whole pattern.
In the same way you can use (?1)
, (?2)
, etc. as aliases for the sub-patterns in group 1, 2, etc.
As an aside, note that except for (?0)
and (?R)
that are obviously always in their sub-pattern, since it is the whole pattern, (?1)
, (?2)
induce a recursion only if they are in their respective own groups, and can be used only to not rewrite a part of a pattern.
something\((?:[^()]|(?R))*\)
doesn't work because it imposes each nested (or not) opening parenthesis to be preceded by something
in your string.
Conclusion, you can't use (?R)
here, and you need to create a capture group to only handle nested parenthesis:
(\((?:[^()]|(?1))*\))
that can be written in a more efficient way:
(\([^()]*(?:(?1)[^()]*)*+\))
To finish you only need to add something
that is no more included in the recursion:
something(\([^()]*(?:(?1)[^()]*)*+\))
Note that if something
is a sub-pattern with an undetermined number of capture groups, it is more handy to refer to the last opened capture group with a relative reference like this:
som(eth)ing(\([^()]*(?:(?-1)[^()]*)*+\))
Upvotes: 3
Reputation: 5119
This is a pretty literal way to match the desired text and handle the nested parentheses:
something\s*\(.*?\)+
https://regex101.com/r/cN6nQ9/1
Upvotes: 1
Reputation: 67968
[^() ]*(\((?:[^()]|(?1))*\))
You need to use ?1
.(?1) recurses the 1st subpattern
.See demo.
https://regex101.com/r/cJ6zQ3/4
Upvotes: 3