maestromusica
maestromusica

Reputation: 706

Is a single big regex more efficient than a bunch of smaller ones?

I'm working on a function that uses regular expressions to find some product codes in a (very long) string given as an argument.

There are many possible forms of that code, for example:

UK[A-z]{10} or DE[A-z]{20} or PL[A-z]{7} or...

What solution would be better? Many (most probably around 20-50) small regular expressions or one huge monster-regex that matches them all? What is better when performance is concerned?

Upvotes: 5

Views: 3567

Answers (1)

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89557

It depends what kind of big regex you write. If you end with a pathological pattern it's better to test smaller patterns. Example:

UK[A-Za-z]{10}|DE[A-Za-z]{20}|PL[A-Za-z]{7}

this pattern is very inefficient because it starts with an alternation, this means that in the worst case (no match) each alternative needs to be tested for all positions in the string.
(* Note that a regex engine like PCRE is able to quickly find potentially matching positions when each branch of an alternation starts with literal characters.)

But if you write your pattern like this:

(?=[UDP][KEL])(?:UK[A-Za-z]{10}|DE[A-Za-z]{20}|PL[A-Za-z]{7})

or the variation:

[UDP][KEL](?:(?<=UK)[A-Za-z]{10}|(?<=DE)[A-Za-z]{20}|(?<=PL)[A-Za-z]{7})

Most of the positions where the match isn't possible are quickly discarded before the alternation.

Also, when you write a single pattern, obviously, the string is parsed only once.

Upvotes: 6

Related Questions