Reputation: 944
I would like to match strings, where foo
appears within select([...])
, but only if possibly occurring parentheses match together. E.g. match select(((foo)))
or select(x(())(foo(x))()x((y)x)x())
but not select((foo)
or select(x(foo)y()
.
I know, I have to limit the maximum number of nested parentheses and I came up with the following regular expression to solve the problem for one additional pair of parentheses:
select\((?:
(?:[^()]*|[^()]*\([^()]*\)[^()]*)*
foo
(?:[^()]*|[^()]*\([^()]*\)[^()]*)*
|
(?:[^()]*|[^()]*\([^()]*\)[^()]*)*
\([^()]*foo[^()]*\)
(?:[^()]*|[^()]*\([^()]*\)[^()]*)*
)\)
That means that within select([...])
either match foo
with no or one pair of parentheses in front or behind or match foo
within one pair of parentheses and with no or one pair of parentheses in front or behind.
Does anyone have a neater solution for this?
Expanding my regex to solve the problem for two additional pair of parentheses would look like this:
select\((?:
(?:[^()]*|[^()]*\((?:[^()]*|[^()]*\([^()]*\)[^()]*)*\)[^()]*)*
foo
(?:[^()]*|[^()]*\((?:[^()]*|[^()]*\([^()]*\)[^()]*)*\)[^()]*)*
|
(?:[^()]*|[^()]*\((?:[^()]*|[^()]*\([^()]*\)[^()]*)*\)[^()]*)*
\((?:
(?:[^()]*|[^()]*\([^()]*\)[^()]*)*
foo
(?:[^()]*|[^()]*\([^()]*\)[^()]*)*
|
(?:[^()]*|[^()]*\([^()]*\)[^()]*)*
\([^()]*foo[^()]*\)
(?:[^()]*|[^()]*\([^()]*\)[^()]*)*
)\)
(?:[^()]*|[^()]*\((?:[^()]*|[^()]*\([^()]*\)[^()]*)*\)[^()]*)*
)\)
Whereby the most indented part is basically the previous regex and the no or one pair of parentheses
parts have been expanded to no or one or two pair of parentheses
.
I put this last regex on regex101: https://www.regex101.com/r/fJ6cR4/1
The problem is that this regex (and even more further expanded versions) is quite time-consuming, so I'm hoping for better ideas.
Upvotes: 1
Views: 102
Reputation: 627607
As per Regular-expressions.info:
If you want a regex that does not find any matches in a string that contains unbalanced parentheses, then you need to use a subroutine call instead of recursion.
I have tried to adapt a regex on that site for your needs:
(?im)^(?![^()]*\(\))[^()\n]*+(\((?>[^()\n]|(?1))*+\)[^()\n]*)++$
See demo
You can use this regex in Sublime Text, unfortunately, it does not work in Notepad++.
To enforce a requirement of match must begin with select(
, contain foo
and end with )
, you just need to add a (?=select\(.*foo.*\)$)
positive look-ahead in the beginning:
^(?=select\(.*foo.*\)$)(?![^()]*\(\))[^()\n]*+(\((?>[^()\n]|(?1))*+\)[^()\n]*)++$
See updated demo
.NET regex solution
You may use a balanced + conditional constructs with a .NET regex (say, for use with grepWin
):
select\((?>(?<f>foo)|[^()]|(?<o>)\(|(?<-o>)\))*(?(o)(?!))(?(f)|(?!))\)
See the regex demo. As you see, select(((foo)))
and select(x(())(foo(x))()x((y)x)x())
are matched, but select((fo))
(no foo
) and select((foo)
(no matching parentheses) are not.
Details
select\(
- a select(
substring(?>
- start of an atomic group:
(?<f>foo)|
- Group "f": a foo
string, or[^()]|
- any char but (
and )
, or(?<o>)\(|
- a (
char, and an empty string pushed on to Group "o" stack(?<-o>)\)
)*
- end of the atomic group, 0 or more repetitions(?(o)(?!))
- the first conditional construct: if Group "o" stack is not empty, fail the match (nested parantheses check)(?(f)|(?!))
- the second conditional construct: if Group "f" stack is empty, fail the match (if "foo" is not present, when foo
is present, Group "f" stack is not empty)\)
- a )
charUpvotes: 1
Reputation: 665574
There are two things you should do to simplify (and speed up) your regex:
(?: [^()]* | [^()]*\([^()]*\)[^()]* )*
is an example of catastrophic backtracking. The outer, repeated group should only have two alternatives: a sequence of non-parenthesised characters or such a sequence between parenthesis:
(?: [^()]+ | \([^()]*\) )*
You were mixing the non-parenthesised characters [^()]*
into both alternatives.
Instead of doing …foo…|…\(foo\)…
, you better should do …(?:foo|\(foo\))…
so that you don't have to repeat the lengthy …
thing.
With those two, your smaller expression becomes
select\(
(?: [^()]+ | \([^()]*\) )*
(?: foo | \([^()]*foo[^()]*\) )
(?: [^()]+ | \([^()]*\) )*
\)
I'll leave applying these onto the larger expression to you.
Upvotes: 2