Reputation: 1321
I have directed graph stored in the following format in the database {STARTNODE, ENDNODE}. Therefore, {5,3} means there is an arrow from node 5 to node 3.
Now I need to calculate the distance between two random nodes. What is the most efficient way? By the way, the graph is has loops.
Thanks a lot!
Upvotes: 2
Views: 33565
Reputation: 879541
If by distance we mean the minimum number of hops, then you could use Guido van Rossum's find_shortest_path function:
def find_shortest_path(graph, start, end, path=[]):
"""
__source__='https://www.python.org/doc/essays/graphs/'
__author__='Guido van Rossum'
"""
path = path + [start]
if start == end:
return path
if not graph.has_key(start):
return None
shortest = None
for node in graph[start]:
if node not in path:
newpath = find_shortest_path(graph, node, end, path)
if newpath:
if not shortest or len(newpath) < len(shortest):
shortest = newpath
return shortest
if __name__=='__main__':
graph = {'A': ['B', 'C'],
'B': ['C', 'D'],
'C': ['D'],
'D': ['C'],
'E': ['F'],
'F': ['C']}
print(find_shortest_path(graph,'A','D'))
# ['A', 'B', 'D']
print(len(find_shortest_path(graph,'A','D')))
# 3
Upvotes: 6
Reputation: 29889
Given that the distance is the number of hops, and is optimal (shortest path.) You may keep track of visited nodes and current reachable nodes using Python's list/set. Starts from the first node and then keep hopping from the current set of nodes until you reach the target.
For example, given this graph:
[hop 0]
visited: {}
current: {A}
[hop 1]
visited: {A}
current: {B, C, J}
[hop 2]
visited: {A, B, C, J}
current: {D, E, F, G, H}
[hop 3]
visited: {A, B, C, D, E, F, G, H, J}
current: {K} // destination is reachable within 3 hops
The point of visited-node list is to prevent visiting the visited nodes, resulting in a loop. And to get shortest distance, it's no use to make a revisit as it always makes the distance of resulting path longer.
This is a simple implementation of Breadth-first search. The efficiency partly depends on how to check visited nodes, and how to query adjacent nodes of given node. The Breadth-first search always guarantee to give optimal distance but this implementation could create a problem if you have lots of node, say billion/million, in your database. I hope this gives the idea.
Upvotes: 5
Reputation: 48061
If you are really looking for the most efficient way, then the solution is to implement breadth first search in C and then call the implementation from the Python layer. (Of course this applies only if the edges are unweighted; weighted edges require Dijkstra's algorithm if the weights are non-negative, or the Bellman-Ford algorithm if weights can be negative).
Incidentally, the igraph library implements all these algorithms in C, so you might want to try that. If you prefer a pure Python-based solution (which is easier to install than igraph), try the NetworkX package.
Upvotes: 0
Reputation: 1997
As you can see here
If you have unweighted edges you can use BFS
If you have non-negative edges you can use Dijkstra
If you have negative or positive edges you most use Bellman-Ford
Upvotes: 14