highsciguy
highsciguy

Reputation: 2637

Enhancing Perl regular expression for balanced matching

I am routinely using the following regex for balanced matching

my $np;
$np = qr{
                \{
                (?:
                        (?> [^\{\}]+ )
                    |
                        (??{ $np })
                )*
                \}
            }x;

as, for example,

my $text = "{string{string1}{string2}}";
$text =~ /($np)/;

The question is whether it can be extended such that it matches also when string etc. contains a } or { preceded by a backslash. This means that escaped curly brackets should be ignored (treated like any other character) when the balanced matching is performed.

Upvotes: 0

Views: 120

Answers (1)

Borodin
Borodin

Reputation: 126722

Sure. All you have to do is change the "any other character" expression to one that will accept escaped braces as well as "anything except a brace"

(?>  (?: \\[()] | [^{}] )+  )

Note also that the (??{ $np }) construct has long been superseded, and if you have version 10 or later of Perl 5 you can use the built-in recursion mechanism, whereby (?R) will recurse the entire expression from the start.

use strict;
use warnings;
use 5.010;

my $np = qr{
  \{
    (?:
      (?>
        (?: \\\\ | \\} | \\} | [^{}] )*
      )
    |
      (?R)
    )*
  \}
}xs; 

my $text = '{string{string1 \} test}{string2}}';
$text =~ /($np)/;

say $1;

output

{string{string1 \} test}{string2}}

Please note that I don't believe the "no backtrack" construct (?> ... ) is useful here. The intermediate string has been specified so that anything following must match the next token or the end of the string, and there are no non-greedy wildcards. But I am sure it does no harm so I have left it in.


Update

To allow for escaped opening braces before the first regular brace, it is simplest to write a separate regex for a "regular character", which is anything but an opening or closing brace, or an escaped anything.

Like this

use strict;
use warnings;
use 5.010;

my $reg_char = qr/(?: \\. | [^{}] )/xs;  # Define what *isn't* a brace

my $np = qr{
  \{
    (?:
      (?> $reg_char* )
    |
      (?R)
    )*
  \}
}x; 

my $text = 'aaa \{ bbb {string{string1 \} test}{string2}}';
die unless $text =~ / $reg_char* ($np) /x;

say $1;

output

{string{string1 \} test}{string2}}

Upvotes: 3

Related Questions