daisy
daisy

Reputation: 23501

Speed up regex matching?

I'm writing a log parser, that reads the log line-by-line, I have around 100 rules and it works like this:

if ($line =~ /blabla (field1) (field2)/) 
{ do something } 
else if ($line =~ /something (field1) (field2) else/)
{ do something }

But for a big log file, it could be slow to match one line against so many rules, it would be O(n).

So, is there any suggestion on this problem? Since it's not just plain string and wildcard matching, I don't know if there's any data structure I could use.

Upvotes: 3

Views: 245

Answers (4)

ebayindir
ebayindir

Reputation: 441

You can combine multiple regex into a single one to speed up the matches by using Regexp::Assemble

Below is from the description of the module

Regexp::Assemble takes an arbitrary number of regular expressions and assembles them into a single regular expression (or RE) that matches all that the individual REs match.

As a result, instead of having a large list of expressions to loop over, a target string only needs to be tested against one expression. This is interesting when you have several thousand patterns to deal with. Serious effort is made to produce the smallest pattern possible.

It is also possible to track the original patterns, so that you can determine which, among the source patterns that form the assembled pattern, was the one that caused the match to occur.

Upvotes: 1

ikegami
ikegami

Reputation: 385849

Perhaps a dispatch table could be used?

my %handlers = (
   blabla    => \&blabla,
   something => \&something,
);

while (<>) {
   my ($keyword) = $line =~ /^(\S+)/
      or next;

   $handlers{$keyword}
      or next;

   $handlers{$keyword}->($line);
}

Upvotes: 7

Andy Lester
Andy Lester

Reputation: 93676

I believe your optimizations are premature.

Have you tried it with this notional big log file? Is it actually too slow? Then, if it actually is too slow, use a profiling tool like Devel::NYTProf to find out what exactly is being slow.

Upvotes: 5

Deck
Deck

Reputation: 1979

I would recommend you to redesign your log parser. May be I am wrong but I think you try to match all cases which could occur in log file.

Try to use lexical and syntactic parser. Sorry, I don't know good samples on Perl, but something like Parse::Yapp

Upvotes: 1

Related Questions