Reputation: 12497
I use Regexp::Assemble in my project, but I don't understand why this little sample doesn't work:
#!/usr/bin/perl
use strict;
use warnings;
use Regexp::Assemble;
my $re1 = "(run (?:pre|post)flight script for .+)";
my $re2 = "((?:Configu|Prepa)ring volume .+)";
my $ra = Regexp::Assemble->new;
$ra->add($re1);
$ra->add($re2);
my $global = $ra->re;
print "GLOBAL: $global\n";
1;
I got this error:
Unmatched ( in regex; marked by <-- HERE in m/( <-- HERE ?:(run (?:pre|post)flight script for|((?:Configu|Prepa)ring volume) .+)/ at /usr/share/perl5/Regexp/Assemble.pm line 1003.
Edit: If I just print the resulting Regexp ($ra->as_string) I got this:
GLOBAL: (?:(run (?:pre|post)flight script for|((?:Configu|Prepa)ring volume) .+)
There is one ')' missing...
Upvotes: 1
Views: 571
Reputation: 4419
I'm the author of R::A. This question comes up every couple of years. The idea is that you don't want to add complex parenthensised patterns. Add more, simpler patterns, e.g.
run preflight script for .+
run postflight script for .+
Configuring volume .+
Preparing volume .+
Don't try and do the work of the module. For instance, your premature grouping has resulted int the trailing .+
common to all patterns not being factored into one occurence in the regexp. The result is that you have introduced unnecessary backtracking. The more patterns you add, the worse it will be.
Calling add() in a different order will produce the same resulting pattern (or else it's a bug I'd like to know about).
Otherwise you can pretokenise the patterns yourself, and use insert() to insert the pattern lexemes directly into the internal trie structure used to build the pattern. (This will be much faster, because the lexer is very slow: it consumes more than half the runtime for assembling a pattern).
Upvotes: 4
Reputation: 78105
Ether's approach seems like a plan - If you look at the module documentation it mentions specifically to watch out:
add()
... It uses a naive regular expression to lex the string that may be fooled [by] complex expressions (specifically, it will fail to lex nested parenthetical expressions such as ab(cd(ef)?gh)ij correctly). If this is the case, the end of the string will not be tokenised correctly and returned as one long string.
Upvotes: 4
Reputation: 53966
This looks like a bug? You are confusing the regex constructor. See how it combined your two patterns and mismatched the parentheses:
my $re1 = "(run (?:pre|post)flight script for .+)";
my $re2 = "((?:Configu|Prepa)ring volume .+)";
# m/(?:(run (?:pre|post)flight script for|((?:Configu|Prepa)ring volume) .+)/ at...
Try removing the extra set of parentheses from your regexes and see if that helps:
my $re1 = "run (?:pre|post)flight script for .+";
my $re2 = "(?:Configu|Prepa)ring volume .+";
Upvotes: 2