Reputation: 78536

efficiently compute the edit distance between 1 string and a large set of other strings?

The use case is auto-complete options where I want to rank a large set of other strings by how like a fixed string they are.

Is there any bastardization of something like a DFA RegEx that can do a better job than the start over on each option solution?

The guy who asked this question seems to know of a solution but doesn't list any sources.

(p.s. "Read this link" type answer welcome.)

Upvotes: 1

Answers (1)

Reputation: 10565

I did something like this recently. Unfortunately it's closed source.

The solution is to write a levenshtein automaton. Spoiler: it's a NFA.

Although many people will try to convince you that simulating NFAs is exponential, it isn't. Creating a DFA from NFA is exponential. Simulating is just polynomial. Many regex engines are writen with sub-optimal algorithms based on this.

NFA simulation is O(n*m) for a n-sized string and m states. Or O(n) amortized if you convert it to a DFA lazily (and cache it).

I'm afraid you'll either have to deal with complex automata libraries or will have to write a lot of code (what I did).

Upvotes: 2