Reputation: 5968
I have the following regex:
\{(\w+)(?:\{(\w+))+\}+\}
I need it to match any of the following
{a{b}}
{a{b{c}}}
{a{b{c{d...}}}}
But by using the regex for example on the last one it only matches two groups: a
and c
it doesn't match the b
and 'c', or any other words that might be in between.
How do I get the group to match each single one like:
group #1: a
group #2: b
group #3: c
group #4: d
group #4: etc...
or like
group #1: a
group #2: [b, c, d, etc...]
Also how do I make it so that you have the same amount of {
on the left is there are }
on the right, otherwise don't match?
Thanks for the help,
David
Upvotes: 4
Views: 5279
Reputation: 9650
For regex flavours supporting recursion (PCRE, Ruby) you may employ the following generic pattern:
^({\w+(?1)?})$
It allows to check if the input matches the defined pattern but does not capture desired groups. See Matching Balanced Constructs section in http://www.regular-expressions.info/recurse.html for details.
In order to capture the groups we may convert the pattern checking regex into a positive lookahead which would be checked only once at the start of string ((?:^(?=({\w+(?1)?})$)|\G(?!\A))
) and then just capture all "words" using global search:
(?:^(?=({\w+(?1)?})$)|\G(?!\A)){(\w+)
The a
, b
, c
, etc. are now in the second capture groups.
Regex demo: https://regex101.com/r/2wsR10/2. PHP demo: https://ideone.com/UKTfcm.
Explanation:
(?:
- start of alternation group
^
- start of string(?=
- start of positive lookahead({\w+(?1)?})
- the generic pattern from above$
- enf of string)
- end of positive lookahead|
- or\G
- end of previous match(?!\A)
- ensure the previous \G
does not match the start of the input if the first alternative failed)
- end of alternation group{
- opening brace literally(\w+)
- a "word" captured in the second group.Ruby has different syntax for recursion and the regex would be:
(?:^(?=({\w+\g<1>?})$)|\G(?!\A)){(\w+)
Demo: http://rubular.com/r/jOJRhwJvR4
Upvotes: 3
Reputation: 626826
In .NET, a regex can 1) check balanced groups and 2) stores a capture collection per each capturing group in a group stack.
With the following regex, you may extract all the texts inside each {...}
only if the whole string starting with {
and ending with }
contains a balanced amount of those open/close curly braces:
^{(?:(?<c>[^{}]+)|(?<o>){|(?<-o>)})*(?(o)(?!))}$
See the regex demo.
Details:
^
- start of string{
- an open brace(?:
- start of a group of alternatives:
(?<c>[^{}]+)
- 1+ chars other than {
and }
captured into "c" group|
- or(?<o>{)
- {
is matched and a value is pushed to the Group "o" stack|
- or (?<-o>})
- a }
is matched and a value is popped from Group "o" stack)*
- end of the alternation group, repeated 0+ times(?(o)(?!))
- a conditional construct checking if Group "o" stack is empty}
- a close }
$
- end of string.var pattern = "^{(?:(?<c>[^{}]+)|(?<o>{)|(?<-o>}))*(?(o)(?!))}$";
var result = Regex.Matches("{a{bb{ccc{dd}}}}", pattern)
.Cast<Match>().Select(p => p.Groups["c"].Captures)
.ToList();
Output for {a{bb{ccc{dd}}}}
is [a, bb, ccc, dd]
while for {{a{bb{ccc{dd}}}}
(a {
is added at the beginning), results are empty.
Upvotes: 3