Reputation: 1321

calculate distance between 2 nodes in a graph

I have directed graph stored in the following format in the database {STARTNODE, ENDNODE}. Therefore, {5,3} means there is an arrow from node 5 to node 3.

Now I need to calculate the distance between two random nodes. What is the most efficient way? By the way, the graph is has loops.

Thanks a lot!

Upvotes: 2

Answers (5)

unutbu

Reputation: 879541

If by distance we mean the minimum number of hops, then you could use Guido van Rossum's find_shortest_path function:

def find_shortest_path(graph, start, end, path=[]):
    """
    __source__='https://www.python.org/doc/essays/graphs/'
    __author__='Guido van Rossum'
    """
    path = path + [start]
    if start == end:
        return path
    if not graph.has_key(start):
        return None
    shortest = None
    for node in graph[start]:
        if node not in path:
            newpath = find_shortest_path(graph, node, end, path)
            if newpath:
                if not shortest or len(newpath) < len(shortest):
                    shortest = newpath
    return shortest

if __name__=='__main__':
    graph = {'A': ['B', 'C'],
             'B': ['C', 'D'],
             'C': ['D'],
             'D': ['C'],
             'E': ['F'],
             'F': ['C']}
    print(find_shortest_path(graph,'A','D'))
    # ['A', 'B', 'D']
    print(len(find_shortest_path(graph,'A','D')))
    # 3

Upvotes: 6

Gant

Reputation: 29889

Given that the distance is the number of hops, and is optimal (shortest path.) You may keep track of visited nodes and current reachable nodes using Python's list/set. Starts from the first node and then keep hopping from the current set of nodes until you reach the target.

For example, given this graph:

alt text

[hop 0]
visited: {}
current: {A}

[hop 1]
visited: {A}
current: {B, C, J}

[hop 2]
visited: {A, B, C, J}
current: {D, E, F, G, H}

[hop 3]
visited: {A, B, C, D, E, F, G, H, J}
current: {K} // destination is reachable within 3 hops

The point of visited-node list is to prevent visiting the visited nodes, resulting in a loop. And to get shortest distance, it's no use to make a revisit as it always makes the distance of resulting path longer.

This is a simple implementation of Breadth-first search. The efficiency partly depends on how to check visited nodes, and how to query adjacent nodes of given node. The Breadth-first search always guarantee to give optimal distance but this implementation could create a problem if you have lots of node, say billion/million, in your database. I hope this gives the idea.

Upvotes: 5

Tamás

Reputation: 48061

If you are really looking for the most efficient way, then the solution is to implement breadth first search in C and then call the implementation from the Python layer. (Of course this applies only if the edges are unweighted; weighted edges require Dijkstra's algorithm if the weights are non-negative, or the Bellman-Ford algorithm if weights can be negative).

Incidentally, the igraph library implements all these algorithms in C, so you might want to try that. If you prefer a pure Python-based solution (which is easier to install than igraph), try the NetworkX package.

Upvotes: 0