Compiler matching string to regex, using NFA

Question

I was reading about compilers, the chapter about lexical analyzers(scanners) and I'm puzzled by the following statement:

For an input string X and a regular expression R, the complexity for finding a match using non-deterministic finite automata is:

O(len R * len X)

How can the complexity be polynomial in len R? I'm under the impression that it depends exponentially on len R, because whenever we have a character which may appear a variable number of times(ie followed by the * symbol), we must test all possible number of occurences. If we have multiple characters which appear a variable number of times, we must check all possibilities(by backtracking).

Where am I wrong?

Bergi · Accepted Answer

we must check all possibilities(by backtracking).

Not necessarily by backtracking. There are many ways to implement an NFA. By moving through the input linearly, and transitioning to multiple states at the same time (storing the set of active states in an O(1)-lookup structure), you will get exactly the mentioned runtime - number of states in NFA is linear to length of regex.

See also the popular articel Regular Expression Matching Can Be Simple And Fast.

Compiler matching string to regex, using NFA

Answers (1)

Related Questions