gcbenison
gcbenison

Reputation: 11963

Transform one word into another by changing, inserting, or deleting one character at a time

Given a finite dictionary of words and a start-end pair (e.g. "hands" and "feet" in the example below), find the shortest sequence of words such that any word in the sequence can be formed from either of its neighbors by either 1) inserting one character, 2) deleting one character, or 3) changing one character.

hands -> hand -> and -> end -> fend -> feed -> feet

For those who may be wondering - this is not a homework problem that was assigned to me or a question I was asked in an interview; it is simply a problem that interests me.

I am looking for a one- or two- sentence "top down view" of what approach you would take -- and for the daring, a working implementation in any language.

Upvotes: 4

Views: 6701

Answers (3)

Reinstate Monica
Reinstate Monica

Reputation: 4723

Instead of turning the dictionary into a full graph, use something with a little less structure:

For each word in the dictionary, you get a shortened_word by deleting character number i for each i in len(word). Map the pair (shortened_word, i) to a list of all the words.

This helps looking up all words with one replaced letter (because they must be in the same (shortened_word, i) bin for some i, and words with one more letter (because they must be in some (word, i) bin for some i.

The Python code:

from collections import defaultdict, deque
from itertools import chain

def shortened_words(word):
    for i in range(len(word)):
        yield word[:i] + word[i + 1:], i


def prepare_graph(d):
    g = defaultdict(list)
    for word in d:
        for short in shortened_words(word):
            g[short].append(word)
    return g


def walk_graph(g, d, start, end):
    todo = deque([start])
    seen = {start: None}
    while todo:
        word = todo.popleft()
        if word == end: # end is reachable
            break

        same_length = chain(*(g[short] for short in shortened_words(word)))
        one_longer = chain(*(g[word, i] for i in range(len(word) + 1)))
        one_shorter = (w for w, i in shortened_words(word) if w in d)
        for next_word in chain(same_length, one_longer, one_shorter):
            if next_word not in seen:
                seen[next_word] = word
                todo.append(next_word)
    else: # no break, i.e. not reachable
        return None # not reachable

    path = [end]
    while path[-1] != start:
        path.append(seen[path[-1]])
    return path[::-1]

And the usage:

dictionary = ispell_dict # list of 47158 words

graph = prepare_graph(dictionary)
print(" -> ".join(walk_graph(graph, dictionary, "hands", "feet")))
print(" -> ".join(walk_graph(graph, dictionary, "brain", "game")))

Output:

hands -> bands -> bends -> bents -> beets -> beet -> feet
brain -> drain -> drawn -> dawn -> damn -> dame -> game

A word about speed: building the 'graph helper' is fast (1 second), but hands -> feet takes 14 seconds, and brain --> game takes 7 seconds.

Edit: If you need more speed, you can try using a graph or network library. Or you actually build the full graph (slow) and then find paths much faster. This mostly consists of moving the look-up of edges from the walking function to the graph-building function:

def prepare_graph(d):
    g = defaultdict(list)
    for word in d:
        for short in shortened_words(word):
            g[short].append(word)

    next_words = {}
    for word in d:
        same_length = chain(*(g[short] for short in shortened_words(word)))
        one_longer = chain(*(g[word, i] for i in range(len(word) + 1)))
        one_shorter = (w for w, i in shortened_words(word) if w in d)
        next_words[word] = set(chain(same_length, one_longer, one_shorter))
        next_words[word].remove(word)

    return next_words


def walk_graph(g, start, end):
    todo = deque([start])
    seen = {start: None}
    while todo:
        word = todo.popleft()
        if word == end: # end is reachable
            break

        for next_word in g[word]:
            if next_word not in seen:
                seen[next_word] = word
                todo.append(next_word)
    else: # no break, i.e. not reachable
        return None # not reachable

    path = [end]
    while path[-1] != start:
        path.append(seen[path[-1]])
    return path[::-1]

Usage: Build the graph first (slow, all timings on some i5 laptop, YMMV).

dictionary = ispell_dict # list of 47158 words
graph = prepare_graph(dictionary)  # more than 6 minutes!

Now find the paths (much faster than before, times without printing):

print(" -> ".join(walk_graph(graph, "hands", "feet")))          # 10 ms
print(" -> ".join(walk_graph(graph, "brain", "game")))          #  6 ms
print(" -> ".join(walk_graph(graph, "tampering", "crunchier"))) # 25 ms

Output:

hands -> lands -> lends -> lens -> lees -> fees -> feet
brain -> drain -> drawn -> dawn -> damn -> dame -> game
tampering -> tapering -> capering -> catering -> watering -> wavering -> havering -> hovering -> lovering -> levering -> leering -> peering -> peeping -> seeping -> seeing -> sewing -> swing -> swings -> sings -> sines -> pines -> panes -> paces -> peaces -> peaches -> beaches -> benches -> bunches -> brunches -> crunches -> cruncher -> crunchier

Upvotes: 4

skytreader
skytreader

Reputation: 11707

Quick answer. You can compute for the Levenshtein distance, the "common" edit distance in most dynamic programming texts, and, from the computation table generated, try to build that path.

From the Wikipedia link:

d[i, j] := minimum
               (
                 d[i-1, j] + 1,  // a deletion
                 d[i, j-1] + 1,  // an insertion
                 d[i-1, j-1] + 1 // a substitution
               )

You can take note of when these happens in your code (maybe, in some auxiliary table) and, surely, it'd be easy reconstructing a solution path from there.

Upvotes: 1

VeeArr
VeeArr

Reputation: 6178

A naive approach could be to turn the dictionary into a graph, with the words as nodes and the edges connecting "neighbors" (i.e. words that can be turned into one another via one operation). Then you could use a shortest-path algorithm to find the distance between word A and word B.

The hard part about this approach would be finding a way to efficiently turn the dictionary into a graph.

Upvotes: 1

Related Questions