Reputation: 33
I am currently working on a model of a polymer and to investigate it statistically I need to calculate the topological distance between 2 nodes (basically shortest distance between 2 nodes in a graph) for all the different nodes in the polymer (or graph in this case).
Now, for small graphs of less than a 1,000 nodes the computation is fairly manageable with not too long waiting times, but obviously for larger graphs the calculation takes hours to days until completion.
It is important to note that my graph is sparse, where if we simplify it to a long chain, each chain in the graph is connected to its neighbor (like the matrix of a coupled-oscillator problem) and some chains are connected as well based on a very small probability.
First I tried using the obvious choice - BFS. It worked okay on smaller graphs but it took much too long on bigger graphs (above 500 I'd say it wasn't functional).
After that I tried using a Monte-Carlo method with euclidean bias to lead the direction of advancements, which was much more efficient than BFS but the results weren't that accurate and for many nodes it couldn't find a path.
I then tried using the A* algorithm which worked surprisingly well, providing calculation times better even than Monte-Carlo and providing results like BFS which really surprised me.
Due to the nature of the calculation though, for large graphs, for example of 2,000 nodes, the calculation time is almost 4 hours. I would like to eventually do this calculation for a graph of 5,000 nodes at least, and 10,000 would be exactly what I need.
I wanted to know if anyone is familiar with other algorithms that are suited for this problem? I have searched for methods that were used in similar models but didn't find anything significantly helpful.
Thanks ahead!
Upvotes: 3
Views: 58
Reputation: 59283
This problem is called "All Pairs Shortest Paths".
For general graphs, this is traditionally solved in O(n3) time using the Floyd-Warshall Algorithm, where "n" is the number of nodes in the graph.
Note that the space required, and the size of the output produced, is O(n2). That is a lot of output for big graphs. Your maximum graph size will probably top out at 100000 or so, limited by your system's memory.
Also note that n3 grows pretty quickly. If you're writing this in C or equivalent, then it will be very fast for a graph with 1000 nodes (a few milliseconds), but for 100000 nodes it could take a couple hours. If you're writing in a slower language then it would be slower, but this is a common algorithm so you might find a fast library implementation. scipy
has one, for example.
For sparse graphs, where each node has only a few edges on average, BFS is much faster than Floyd-Warshall, taking O(n2) time.
Comparing with the above, with an average of 4 edges per node or less, you should be able to calculate all shortest path lengths for 100000 nodes in a few seconds.
From your description, it sounds like you tried BFS, but you incorrectly did it separately for each pair of nodes. That's a lot more work than you need to do, because one BFS search from a single source node will find the shortest paths to all other nodes. You only need to do n searches all together.
Upvotes: 3