Dr.Avalanche
Dr.Avalanche

Reputation: 1996

Perl remove text within () characters

I have a variable which may or may not contain text within brackets, e.g.

blah blah (soups up)

I want to remove anything within and including the brackets, so for this example I'd be left with:

blah blah

I tried the following substitution but it didn't work as expected:

$desc =~ s/(.*?)//gs;
print "fixed desc: $desc\n";

prints:

fixed desc:

As per the discussion, anything, including sub brackets within brackets should be blitz'd

e.g.

blah blah (soups up (tomato!) )

Upvotes: 0

Views: 117

Answers (2)

Schwern
Schwern

Reputation: 164699

Matching balanced text is a classic hard regex problem. For example, how do you deal with keep (remove) keep (remove)? Fortunately it's gotten much easier. perlfaq4 covers it. You have two choices.

First is to use recursive regexes introduced in 5.10. (?R) says to recurse the whole pattern.

m{
    \(                        # Open paren
       (?>
           [^()]   |          # No nested parens OR
           (?R)               # Recurse to check for balanced parens
       )*
    \)                        # Close paren
 }x;

However, this doesn't deal with escapes like (this is \) all in parens).

Rather than go into the regex contortions necessary to handle escapes, use a module to build that regex for you. Regexp::Common::balanced and Regexp::Common::delimited can do that, and a lot of other hard regex problems, and it will handle escapes.

use v5.10;
use strict;
use warnings;
use Regexp::Common;

my $re = $RE{balanced}{-parens=>"()"};

my $s = "blah blah (soups up (tomato!\) )";

$s =~ s{$re}{};

say $s;    # "blah blah"

Upvotes: 4

heyhey2k
heyhey2k

Reputation: 66

Well the first thing to note in the most simple case, if you aren't yet worried about some of the edge cases mentioned above, is that the bracket characters are also used for grouping and backreferences in regexes. So you'll need to escape them in your match statement like so:

$desc =~ s/\(.*\)//gs;

Here's some more info on the topic: http://perlmeme.org/faqs/regexp/metachar_regexp.html

Second question: What are you intending to do with the question mark in the match? The '*' will match from 0-n occurrences of the previous character, so I'm not sure the '?' is going to do much here.

Upvotes: 0

Related Questions