Reputation: 2637
I am routinely using the following regex for balanced matching
my $np;
$np = qr{
\{
(?:
(?> [^\{\}]+ )
|
(??{ $np })
)*
\}
}x;
as, for example,
my $text = "{string{string1}{string2}}";
$text =~ /($np)/;
The question is whether it can be extended such that it matches also when string
etc. contains a }
or {
preceded by a backslash. This means that escaped curly brackets should be ignored (treated like any other character) when the balanced matching is performed.
Upvotes: 0
Views: 120
Reputation: 126722
Sure. All you have to do is change the "any other character" expression to one that will accept escaped braces as well as "anything except a brace"
(?> (?: \\[()] | [^{}] )+ )
Note also that the (??{ $np })
construct has long been superseded, and if you have version 10 or later of Perl 5 you can use the built-in recursion mechanism, whereby (?R)
will recurse the entire expression from the start.
use strict;
use warnings;
use 5.010;
my $np = qr{
\{
(?:
(?>
(?: \\\\ | \\} | \\} | [^{}] )*
)
|
(?R)
)*
\}
}xs;
my $text = '{string{string1 \} test}{string2}}';
$text =~ /($np)/;
say $1;
output
{string{string1 \} test}{string2}}
Please note that I don't believe the "no backtrack" construct (?> ... )
is useful here. The intermediate string has been specified so that anything following must match the next token or the end of the string, and there are no non-greedy wildcards. But I am sure it does no harm so I have left it in.
Update
To allow for escaped opening braces before the first regular brace, it is simplest to write a separate regex for a "regular character", which is anything but an opening or closing brace, or an escaped anything.
Like this
use strict;
use warnings;
use 5.010;
my $reg_char = qr/(?: \\. | [^{}] )/xs; # Define what *isn't* a brace
my $np = qr{
\{
(?:
(?> $reg_char* )
|
(?R)
)*
\}
}x;
my $text = 'aaa \{ bbb {string{string1 \} test}{string2}}';
die unless $text =~ / $reg_char* ($np) /x;
say $1;
output
{string{string1 \} test}{string2}}
Upvotes: 3