Reputation: 1900
I'm trying to cook up a regular expression to match balanced curly brackets which takes into account, and skips over, escaped curly brackets.
The following regex is not working though. The script prints { def \}
instead of the expected output: { def \} hij \\\} klm }
. What am I doing wrong? How can I improve it?
my $str = 'abc { def \} hij \\\} klm } nop';
if ( $str =~ m/
(
\{
(?: \\\\
| \\[\{\}]
| [^\{\}]+
| (?-1)
)*
\}
)
/x
) { print $1, "\n" }
Upvotes: 2
Views: 661
Reputation: 626699
You can use the following regex that will support any escaped symbols:
(?<=^|\\.|[^\\])({(?>\\.|[^{}]|(?1))*})
VERBOSE version with comments:
(?<=^|\\.|[^\\]) # Before `{` there is either start of string, escaped entity or not a \
(
{ # Opening {
(?> # Start of atomic group
\\. # Any escaped symbol
|
[^{}] # any symbol but `{` and `}`
|
(?1) # Recurse the first subpattern
)* # repeat the atomic group 0 or more times
} # closing brace
)
See the regex demo
UPDATE
Since the above regex may match an escaped opening brace as first character, you may use
[^\\{}]*(?:\\.[\\{}]*)*(?<!\\)({(?>\\.|[^{}]|(?1))*})
See the regex demo
It will match all escaped and unnecessary substrings and capture into Group 1 only valid substrings.
Upvotes: 2
Reputation: 126722
There are two problems here -- the value of the string in $str
and the regex pattern
Even within a single-quoted string, backslashes must be escaped when two appear together or when they appear as the last character in the string. A pair of backslashes is reduced to one, so the substring \\\}
will generate \\}
in the final string. To produce three backslashes followed by a closing brace, you need six backslashes in the code -- \\\\\\}
(although five will do)
Your regex pattern is incorrect because the character class [^{}]
will also match a single backslash, which will prevent it from being identified as part of an escaped brace sequence. So the alternative [^{}\\]+
is matching def \
from the string, leaving the }
detached from its backslash
This program does what you need
use strict;
use warnings 'all';
my $str = 'abc { def \} hij \\\\\\} klm } nop';
print $str, "\n";
if ( $str =~ m/
(
\{
(?:
[^{}\\]+ |
\\. |
(?-1)
)*
\}
)
/xs ) {
print $1, "\n";
}
abc { def \} hij \\\} klm } nop
{ def \} hij \\\} klm }
Upvotes: 3