Reputation: 6078
I'm wondering whether there's an efficient data structure to perform "Retrieve all strings with levenshtein distance less than X".
Few things I'm interested in:
Upvotes: 5
Views: 1284
Reputation: 74645
this is nearest neighborer search in a metric space with levenshtein distance as the metric (or distance) function
a VP-tree is one of the ways of solving that problem
this Python VP-tree implementation is a working demo that shows how a VP-tree works run it on say a word list it provides an interactive shell where you type a word and it returns the words in that list that are no more then X distance from the word you typed
Upvotes: 3
Reputation: 24988
Sounds like a simple breadth-first search with each generation being just one 'edit' away from the previous - and with checks in place to ensure that a string appears in one-and-only-one level.
This would be easily implemented using a couple of hashsets / hashtables in a pair of loops.
Upvotes: 0