Reputation: 1189
I have a graph having 602647 nodes and 982982 edges. I wanted to find the first and second order contacts (i.e. 1-hop contacts and 2-hops contacts) for each node in the graph in Networkx.
i built the following code that worked fine for smaller graphs, but never finished running for larger (graphs as the one above):
hop_1 = {}
hop_2 = {}
row_1 = {}
row_2 = {}
for u, g in G.nodes(data=True):
row_1.setdefault(u, nx.single_source_shortest_path_length(G, u, cutoff=1))
row_2.setdefault(u, nx.single_source_shortest_path_length(G, u, cutoff=2))
hop_1.update(row_1)
hop_2.update(row_2)
some notes:
results are stored first in a dict (hope_1 and hope_2)
row_1 and row_2 and temporary holding variables
hop-1 will include nodes after one jump
hop-2 will include nodes that are located at both one jump and two jumps
Is there a way to optimize/imrpove this code and finish running?
Upvotes: 0
Views: 590
Reputation: 17882
To find first and second-order neighbors you can use functions all_neighbors()
and node_boundary()
:
hop1 = {}
hop2 = {}
for n in G.nodes():
neighbs1 = list(nx.all_neighbors(G, n))
hop1[n] = neighbs1
hop2[n] = list(nx.node_boundary(G, neighbs1 + [n])) + neighbs1
print(hop1)
# {0: [1, 2, 3], 1: [0, 2, 3], 2: [0, 1, 3, 4], 3: [0, 1, 2, 4], 4: [2, 3]}
print(hop2)
# {0: [4, 1, 2, 3], 1: [4, 0, 2, 3], 2: [0, 1, 3, 4], 3: [0, 1, 2, 4], 4: [0, 1, 2, 3]}
Upvotes: 2
Reputation: 16204
I don't know networkx
; but, by definition, a node that is reachable one hop is also reachable in <=2 hops, which is what the docs (and source) of single_source_shortest_path_length
is giving you. you can therefore remove the first call to single_source_shortest_path_length
.
second, your uses of dictionaries are very strange! why are you using setdefault
rather than just setting elements? also you're copying things a lot with update
which doesn't do anything useful and just wastes time.
I'd do something like:
hop_1 = {}
hop_2 = {}
for u in G.nodes():
d1 = []
d2 = []
for v, n in nx.single_source_shortest_path_length(G, u, cutoff=2).items():
if n == 1:
d1.append(v)
elif n == 2:
d2.append(v)
hop_1[u] = d1
hop_2[u] = d2
which takes about a minute on my laptop with a G_nm
graph as generated by:
import networkx as nx
G = nx.gnm_random_graph(602647, 982982)
note that tqdm
is nice for showing progress of long running loops, just import tqdm
and change the outer for
loop to be:
for u in tqdm(G.nodes()):
...
and you'll get a nice bar reporting progress
Upvotes: 1