cyber-guard
cyber-guard

Reputation: 1846

Lengthy perl regex

This may seem as somewhat odd question, but anyhow to the point;

I have a string that I need to search for many many possible character occurrences in several combinations (so character classes are out of question), so what would be the most efficent way to do this?

I was thinking either stack it into one regex:

if ($txt =~ /^(?:really |really |long | regex here)$/){}

or using several 'smaller' comparisons, but I'd assume this won't be very efficent:

if ($txt =~ /^regex1$/ || $txt =~ /^regex2$/ || $txt =~ /^regex3$/) {}

or perhaps nest several if comparisons.

I will appreciate any extra suggestions and other input on this issue. Thanks

Upvotes: 2

Views: 229

Answers (3)

tchrist
tchrist

Reputation: 80384

Ever since way back in v5.9.2, Perl compiles a set of N alternatives like:

/string1|string2|string3|string4|string5|.../

into a trie data structure, and if that is the first thing in the pattern, even uses Aho–Corasick matching to find the start point very quickly.

That means that your match of N alternatives will now run in O(1) time instead of in the O(N) time that this:

if (/string1/ || /string2/ || /string3/ || /string4/ || /string5/ || ...)

will run in.

So you can have O(1) or O(N) performance: your choice.

If you use re "debug" or -Mre-debug, Perl will show these trie structures in your patterns.

Upvotes: 5

Deck
Deck

Reputation: 1979

I think it depends on how long regex you have. Sometimes better to devide very long expressions.

Upvotes: 0

Joel Berger
Joel Berger

Reputation: 20280

This will not replace some time testing. If possible though, I would suggest using the o flag if possible so that Perl doesn't recompile your (large) regex on every evaulation. Of course this is only possible if those combinations of characters do not change for each evaluation.

Upvotes: 0

Related Questions