Tohiko
Tohiko

Reputation: 1982

Matching text not enclosed by parenthesis

I am still learning Perl, so apologies if this is an obvious question. Is there a way to match text that is NOT enclosed by parenthesis? For example, searching for foo would match the second line only.

(bar foo bar)
bar foo (
bar foo 
   (bar) (foo)
)

Upvotes: 3

Views: 359

Answers (2)

zdim
zdim

Reputation: 66901

This is very far from "obvious"; on the contrary. There is no direct way to say "don't match" for a complex pattern (there is good support at a character level, with [^a], \S etc). Regex is firstly about matching things, not about not-matching them.

One approach is to match those (possibly nested) delimiters and get everything other than that.

A good tool for finding nested delimiters is the core module Text::Balanced. As it matches it can also give us the substring before the match and the rest of the string after the match.

use warnings;
use strict;
use feature 'say';

use Text::Balanced qw(extract_bracketed);

my $text = <<'END';
(bar foo bar)
bar foo (
bar foo 
   (bar) (foo)
   )
END

my ($match, $before);
my $remainder = $text;
while (1) {
    ($match, $remainder, $before) = extract_bracketed($remainder, '(', '[^(]*');
    print $before // $remainder;
    last if not defined $match; 
}

The extract_bracketed returns the match, the remainder substring ($remainder), and the substring before the match ($before); so we keep matching in the remainder.

Taken from this post, where there are more details and another way, using Regexp::Common.

Upvotes: 4

ikegami
ikegami

Reputation: 386206

Regex patterns have an implicit leading \G(?s:.)*? ("skip characters until a match is found"). The following expands that definition to consider nested parens to be a character to skip.

while (
   $string =~ m{
      \G (?&MEGA_DOT)*?

      ( foo )

      (?(DEFINE)
         (?<MEGA_DOT> [^()] | \( (?&MEGA_DOT)*+ \) )
      )
   }xg
) {
   say "Found a match at pos $-[1].";
}

Upvotes: 5

Related Questions