Bahar
Bahar

Reputation: 824

Efficiently compute the number of shortest path for a graph with 23000000 nodes using igraph

I am trying to compute the number of shortest path between 2 nodes which in the distance 2 of each other in a sparse graph which contains 23000000 vertices and around 9 X 23000000 edges. Right now I am using

for v,d,parent in graph.bfsiter(source.index, advanced=True):
        if (0 < d < 3):

to loop through the nodes which are within distance 2 of the source node (I need the nodes which are in distance 1 but I don't need to compute all shortest path for them). And then I use:

len (graph.get_all_shortest_paths(source,v));

to get the number of all shortest paths from source to v (where v is the node that bfsiter gives me which has the shortest distance 2 from the source).

However this is taking too long. For example for the graph described above it takes around 1 second to compute the shortest distance for each (source,v).

I was wondering if someone could suggest a more efficient way to compute the number of all shortest paths using igraph

Upvotes: 2

Views: 1180

Answers (1)

Will Beason
Will Beason

Reputation: 3551

Here is an implementation of the answer suggested in the comments. The time consuming part of this code is the graph generation. To run on an already generated/in-memory graph takes very little time.

from igraph import *
import random

# Generate a graph
numnodes = int(1e6)
thegraph = Graph.GRG(numnodes, 0.003)

print("Graph Generated")

# Choose one node randomly and another a distance of 2 away
node1 = random.randint(0, numnodes-1)
node2 = random.sample(set(Graph.neighborhood(thegraph, node1, 2)).difference(
                      set(Graph.neighborhood(thegraph, node1, 1))),1)[0]

# Find the number of nodes in the intersection of the neighborhood
# of order 1.
result = len(set(Graph.neighbors(thegraph, node1)).intersection(
    Graph.neighbors(thegraph, node2)))

print(result)

The intersection of the two neighborhoods is the number of unique paths. A path of length 2 visits 3 nodes. Since we know the start and end point, the only one which may vary is the middle. Since the middle node must be a distance 1 from both of the endpoints, the number of unique middle points is the number of paths of length 2 between the nodes.

Upvotes: 2

Related Questions