Taie
Taie

Reputation: 1189

Find first and second order contacts of each node in a network

I have a graph having 602647 nodes and 982982 edges. I wanted to find the first and second order contacts (i.e. 1-hop contacts and 2-hops contacts) for each node in the graph in Networkx.

i built the following code that worked fine for smaller graphs, but never finished running for larger (graphs as the one above):

hop_1 = {}
hop_2 = {}
row_1 = {}
row_2 = {}

for u, g in G.nodes(data=True):
    row_1.setdefault(u, nx.single_source_shortest_path_length(G, u, cutoff=1))
    row_2.setdefault(u, nx.single_source_shortest_path_length(G, u, cutoff=2))
    hop_1.update(row_1)
    hop_2.update(row_2)

some notes:

Is there a way to optimize/imrpove this code and finish running?

Upvotes: 0

Views: 590

Answers (2)

Mykola Zotko
Mykola Zotko

Reputation: 17882

To find first and second-order neighbors you can use functions all_neighbors() and node_boundary():

enter image description here

hop1 = {}
hop2 = {}

for n in G.nodes():
    neighbs1 = list(nx.all_neighbors(G, n))
    hop1[n] = neighbs1
    hop2[n] = list(nx.node_boundary(G, neighbs1 + [n])) + neighbs1

print(hop1)
# {0: [1, 2, 3], 1: [0, 2, 3], 2: [0, 1, 3, 4], 3: [0, 1, 2, 4], 4: [2, 3]}
print(hop2)
# {0: [4, 1, 2, 3], 1: [4, 0, 2, 3], 2: [0, 1, 3, 4], 3: [0, 1, 2, 4], 4: [0, 1, 2, 3]}

Upvotes: 2

Sam Mason
Sam Mason

Reputation: 16204

I don't know networkx; but, by definition, a node that is reachable one hop is also reachable in <=2 hops, which is what the docs (and source) of single_source_shortest_path_length is giving you. you can therefore remove the first call to single_source_shortest_path_length.

second, your uses of dictionaries are very strange! why are you using setdefault rather than just setting elements? also you're copying things a lot with update which doesn't do anything useful and just wastes time.

I'd do something like:

hop_1 = {}
hop_2 = {}

for u in G.nodes():
    d1 = []
    d2 = []
    for v, n in nx.single_source_shortest_path_length(G, u, cutoff=2).items():
        if n == 1:
            d1.append(v)
        elif n == 2:
            d2.append(v)
    hop_1[u] = d1
    hop_2[u] = d2

which takes about a minute on my laptop with a G_nm graph as generated by:

import networkx as nx

G = nx.gnm_random_graph(602647, 982982)

note that tqdm is nice for showing progress of long running loops, just import tqdm and change the outer for loop to be:

for u in tqdm(G.nodes()):
   ...

and you'll get a nice bar reporting progress

Upvotes: 1

Related Questions