Find first and second order contacts of each node in a network

Question

I have a graph having 602647 nodes and 982982 edges. I wanted to find the first and second order contacts (i.e. 1-hop contacts and 2-hops contacts) for each node in the graph in Networkx.

i built the following code that worked fine for smaller graphs, but never finished running for larger (graphs as the one above):

hop_1 = {}
hop_2 = {}
row_1 = {}
row_2 = {}

for u, g in G.nodes(data=True):
    row_1.setdefault(u, nx.single_source_shortest_path_length(G, u, cutoff=1))
    row_2.setdefault(u, nx.single_source_shortest_path_length(G, u, cutoff=2))
    hop_1.update(row_1)
    hop_2.update(row_2)

some notes:

results are stored first in a dict (hope_1 and hope_2)
row_1 and row_2 and temporary holding variables
hop-1 will include nodes after one jump
hop-2 will include nodes that are located at both one jump and two jumps

Is there a way to optimize/imrpove this code and finish running?

Sam Mason · Accepted Answer

I don't know networkx; but, by definition, a node that is reachable one hop is also reachable in <=2 hops, which is what the docs (and source) of single_source_shortest_path_length is giving you. you can therefore remove the first call to single_source_shortest_path_length.

second, your uses of dictionaries are very strange! why are you using setdefault rather than just setting elements? also you're copying things a lot with update which doesn't do anything useful and just wastes time.

I'd do something like:

hop_1 = {}
hop_2 = {}

for u in G.nodes():
    d1 = []
    d2 = []
    for v, n in nx.single_source_shortest_path_length(G, u, cutoff=2).items():
        if n == 1:
            d1.append(v)
        elif n == 2:
            d2.append(v)
    hop_1[u] = d1
    hop_2[u] = d2

which takes about a minute on my laptop with a G_nm graph as generated by:

import networkx as nx

G = nx.gnm_random_graph(602647, 982982)

note that tqdm is nice for showing progress of long running loops, just import tqdm and change the outer for loop to be:

for u in tqdm(G.nodes()):
   ...

and you'll get a nice bar reporting progress

Find first and second order contacts of each node in a network

Answers (2)

Related Questions