avgvstvs
avgvstvs

Reputation: 6325

What's the Time Complexity of Average Regex algorithms?

I'm not new to using regular expressions, and I understand the basic theory they're based on--finite state machines.

I'm not so good at algorithmic analysis though and don't understand how a regex compares to say, a basic linear search. I'm asking because on the surface it seems like a linear array search. (If the regex is simple.)

Where could I go to learn more about implementing a regex engine?

Upvotes: 78

Views: 75800

Answers (3)

Oscar Mederos
Oscar Mederos

Reputation: 29863

Are you familiar with the term Deterministic/Non-Deterministic Finite Automata?

Real regular expressions (when I say real I'm refering to those regex that recognize Regular Languages, and not the regex that almost every programming language include with backreferences, etc) can be converted into a DFA/NFA and both can be implemented in a mechanical way in a programming language (a NFA can be converted into a DFA)

What you have to do is:

  1. Find a way to convert a regex into an automaton
  2. Implement the recognition of the automaton in the programming language of your preference

That way, given a regex you can convert it to a DFA and run it to see if it matches or not a specified text.

This can be implemented in O(n), because DFA don't go backward (like a Turing Machine), so it matches the string or not. That is supposing you won't take in count overlapped matches, otherwise you will have to go back and start matching again...

Upvotes: 13

mcdowella
mcdowella

Reputation: 19621

The classic regular expression can be implemented in a way which is fast in practice but has really bad worst case behaviour (the standard DFA) or in a way which has guaranteed reasonable worst case behaviour (keeping it as an NFA). The standard DFA can be extended to support lots of extra matching characters and flags, which make use of the fact that it is basically back-tracking search.

Examples of the standard approach are everywhere (e.g. built into Perl). There is an example that claims good worst case behaviour at http://code.google.com/p/re2/ - in fact it is even better than I expected in the worst case, so they may have found an extra trick or two.

If you are at all interested in this, or care about writing programs that can be made to lock up solid given pathological inputs, read http://swtch.com/~rsc/regexp/regexp1.html.

Upvotes: 6

porges
porges

Reputation: 30580

This is one of the most popular outlines: Regular Expression Matching Can Be Simple And Fast . Running a DFA-compiled regular expression against a string is indeed O(n), but can require up to O(2^m) construction time/space (where m = regular expression size).

Upvotes: 88

Related Questions