Python_learner
Python_learner

Reputation: 51

Overall shortest path to selected nodes

I have been stuck on a following question for a while. This problem is a classic Word ladder problem with the slight modifications. I need to write a function which would return an index of the first_word in some given undirected graph (where each edge has weight 1) such that it creates the overall shortest word ladders to each of the words in desired_words (to be more precise it returns an index of the word for which the longest word ladder to any of the words in desired_words is as short as possible); if multiple words are possible it would just return one of them.

Here is some quick example: Let's say I have following graph enter image description here

Let's say I have following list, lst = ["aaa","aad","dad","daa","aca","acc","aab","abb"], After that we create an instance of the class of Graph. And if for example our desired_words are dad, abb, acc then our function would return 0 because the best first_word in this case would be aaa. If we run our function on aaa, aca, acc then it would return 4 because in this case the best first_word would be aca.

I was thinking of various approaches: run Dijkstra' algorithm on each node, and then find the shortest distance to each of the desired_words, or find a graph (Jordan) center, of find a lowest common ancestor between the nodes, but none of them seem to make sense to me. I just don't know how can I apply any of the known algorithms, such that it would return the most optimal node.

Can anyone suggest what would be the correct approach to tackle this problem? Just a general idea, or some step-by-step algorithm. Thank you very much.

P.S. python code would be helpful but not necessary.

Upvotes: 1

Views: 243

Answers (1)

trincot
trincot

Reputation: 350137

You could launch a separate BFS sourced at each of the desired nodes, ... but in sync. As soon as you find a node that has been visited from all of the BFS sources, you have found a solution.

The key is thus to maintain for each node a visited flag per source. This could be a set of source nodes (a source node is at the start of a BFS).

Here is an implementation. The first part builds the graph (as adjacency list), and the second part performs the multiple BFS:

from collections import defaultdict

def ladder(words):
    # Group words when they share a n-1 subsequence
    common = [defaultdict(list) for _ in words[0]]
    for word in words:
        for i, ch in enumerate(word):
            # i-th character removed is key
            common[i][word[:i] + word[i+1:]].append(word)
    
    # Create the graph from those word groups:
    graph = defaultdict(list)
    for dct in common:
        for words in dct.values():
            for i, word in enumerate(words):
                graph[word].extend(words[:i] + words[i+1:])

    return dict(graph)


def getfirstword(ladder, desiredwords):
    visited = {  # For each word: one flag per BFS source
        word: set() 
        for word in graph
    }

    desiredwords = set(desiredwords)  # Remove duplicates
    n = len(desiredwords)
    frontiers = {  # One entry per BFS source
        word: [word]
        for word in desiredwords
    }
    for word in desiredwords:
        visited[word].add(word)

    while any(frontiers):  # If ever False, then graph is disconnected
        for source, words in frontiers.items():
            frontier = []
            for word in words:
                for neighbor in graph[word]:
                    if source not in visited[neighbor]:
                        visited[neighbor].add(source)
                        if len(visited[neighbor]) == n:
                            return neighbor
                        frontier.append(neighbor)
            frontiers[source] = frontier

Your example graph is created as follows:

words = ["aaa","aad","dad","daa","aca","acc","aab","abb"]
graph = ladder(words)

print("Adjacency list:")
for word, neighbors in graph.items():
    print(f"'{word}': {neighbors}")

This outputs:

Adjacency list:
'aaa': ['daa', 'aca', 'aad', 'aab']
'daa': ['aaa', 'dad']
'aad': ['dad', 'aaa', 'aab']
'dad': ['aad', 'daa']
'aca': ['aaa', 'acc']
'acc': ['aca']
'aab': ['abb', 'aaa', 'aad']
'abb': ['aab']

The two example queries you discussed can be made as follows:

queries = [
    ["dad", "abb", "acc"],
    ["aaa", "aca", "acc"]
]
for query in queries:
    print(f"For desired words {query} a solution is: '{getfirstword(graph, query)}'")

This outputs:

For desired words ['dad', 'abb', 'acc'] a solution is: 'aaa'
For desired words ['aaa', 'aca', 'acc'] a solution is: 'aca'

Upvotes: 2

Related Questions