Parallel regex matching with NFA vs DFA? Which one is faster?

Question

I was reading about NFA and DFA and it seems that the most popular and fastest way of implementing regex matcher is to create NFA from regex, convert it to DFA, minimize that DFA, implement it in any language and use it.

DFA is a better choice over NFA because it has only one transition for an input, while NFA can have many. Thus, DFA has only one path to follow, while NFA - many.

But, this is where I do not understand. Why we have to keep track of NFA states and go back to them which slows us, can we split into different threads when encountered an input to more than one state and compute each path in parallel? Wouldn't be faster over DFA? Or I missing something?

Kaz · Accepted Answer

Generally speaking, DFA is faster, but NFA is more compact. The NFA is proportional to the size of the regular expression. (Informal proof: each operator node in a regular expression's syntax just adds a new node to the NFA graph.) Because the DFA is formed from subsets of sets of the NFA states, there are cases when it can be quite large. In the worst case, a DFA is exponentially sized w.r.t the regular expression. An example of this is the expression of the form (a|b)(a|b)(a|b)(a|b)...(a|b) where there are N (a|b) units translates to a DFA whose size is O(2**N). It contains transitions through unique states for all of the combinations for a and b. A degenerate DFA could exceed the size of the CPU cache in cases where the data structures required to simulate the equivalent NFA fit into cache.

There is somewhat more up-front cost to a DFA, due to the extra steps. So tradeoffs apply: will enough data be processed by the NFA simulator to justify building a DFA.

An NFA simulation can entirely avoid touching parts of the regular expression which don't apply to an input at all. For instance, suppose a regex has the form R1|R2, where R1 is very simple and small, and R2 is a huge, complicated beast. Suppose the inputs usually just match R1 and R2 hardly ever applies (as in, no part of it at all, say, due to some mismatching prefix). This influences the tradeoff: compiling to DFA means everything is compiled, the simple R1 part and the monstrous R2 part.

Lastly, an implementation doesn't have to be strictly NFA or DFA. An NFA simulator can cache the state sets which it computes. Those cached states are equivalent to the DFA states and provide a similar benefit as compilation to DFA. You can think of this is "JIT for the NFA". The cache can be trimmed to some fixed size, and subject to a replacement policy, so that expressions whose complete DFA's would be large can be handled in less memory (and nearly as fast, if the data shows good locality of reference in the cache).

Parallel regex matching with NFA vs DFA? Which one is faster?

Answers (1)

Related Questions