efficient way of making a graph of a dictionary

Question

What is the most efficient way of making a graph of the words in a dictionary with hamming distance=1??

Nick Johnson · Accepted Answer

Hamming distance is only defined for words of equal length, so you'll actually have one disjoint graph for each word length in your dictionary. If you meant levenshtein distance, which permits insertions and deletions, then you will indeed have a single graph.

One option is to construct a BK-tree from your dictionary. While not strictly speaking a graph, it allows you to ask the same questions (getting a list of elements with a given distance), and takes O(n log n) time to construct.

Another option is brute-force: For every word, test its distance to all candidate words. You can narrow down candidate words to those of the same length (or length one less or greater, for levenshtein). This is O(n^2) worst-case, but this may be acceptable if you're not building the graph more than once.

Theoretically, there's probably an O(n log n) method of constructing the graph - in the trivial case, constructing a BK-tree, then generating the graph from that is O(mn log n), where m is the average number of edges per node - but I'm not aware of an elegant one.

efficient way of making a graph of a dictionary

Answers (2)

Related Questions