n.r.
n.r.

Reputation: 1900

How to match balanced curly brackets skipping escaped ones?

I'm trying to cook up a regular expression to match balanced curly brackets which takes into account, and skips over, escaped curly brackets.

The following regex is not working though. The script prints { def \} instead of the expected output: { def \} hij \\\} klm }. What am I doing wrong? How can I improve it?

my $str = 'abc { def \} hij \\\} klm } nop';

if ( $str =~ m/
              (
                \{
                  (?: \\\\
                  |   \\[\{\}]
                  |   [^\{\}]+
                  |   (?-1)
                  )*
                \}
              )
              /x
) { print $1, "\n" }

Upvotes: 2

Views: 661

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626699

You can use the following regex that will support any escaped symbols:

(?<=^|\\.|[^\\])({(?>\\.|[^{}]|(?1))*})

VERBOSE version with comments:

(?<=^|\\.|[^\\]) # Before `{` there is either start of string, escaped entity or not a \
(
   {            # Opening {
     (?>        # Start of atomic group
          \\.   # Any escaped symbol 
         |      
          [^{}] # any symbol but `{` and `}`
         | 
          (?1)  # Recurse the first subpattern
     )*         # repeat the atomic group 0 or more times
   }            # closing brace
)

See the regex demo

UPDATE

Since the above regex may match an escaped opening brace as first character, you may use

[^\\{}]*(?:\\.[\\{}]*)*(?<!\\)({(?>\\.|[^{}]|(?1))*})

See the regex demo

It will match all escaped and unnecessary substrings and capture into Group 1 only valid substrings.

Upvotes: 2

Borodin
Borodin

Reputation: 126722

There are two problems here -- the value of the string in $str and the regex pattern

Even within a single-quoted string, backslashes must be escaped when two appear together or when they appear as the last character in the string. A pair of backslashes is reduced to one, so the substring \\\} will generate \\} in the final string. To produce three backslashes followed by a closing brace, you need six backslashes in the code -- \\\\\\} (although five will do)

Your regex pattern is incorrect because the character class [^{}] will also match a single backslash, which will prevent it from being identified as part of an escaped brace sequence. So the alternative [^{}\\]+ is matching def \ from the string, leaving the } detached from its backslash

This program does what you need

use strict;
use warnings 'all';

my $str = 'abc { def \} hij \\\\\\} klm } nop';

print $str, "\n";

if ( $str =~ m/
              (
                \{
                  (?:
                  [^{}\\]+  |
                  \\.       |
                  (?-1)
                  )*
                \}
              )
              /xs ) {

    print $1, "\n";
}

output

abc { def \} hij \\\} klm } nop
{ def \} hij \\\} klm }

Upvotes: 3

Related Questions