Malik Hanzala
Malik Hanzala

Reputation: 137

Find all inexistent connections in graph

I have a pandas dataframe Edges with two columns Node 1 | Node 2

      Node1     Node2
 0      A         B
 1      C         B
 2      C         D

These basically show edges (A,B) | (C,B) | (C,D) in a graph

What i need to find out is Missing edges i.e not present (A,D) | (B,D) | (A,C)

Desired output

      Node1     Node2
 0      A         D
 1      A         C
 2      B         D

What i tried:

i converted DataFrame into networkx graph and then used nx.non_edges function for same purpose ( to find missing edges) but due to low hardware resources networkx fills up the RAM and notebook crashes . I'm trying to find if it is possible through pandas Dataframe that i could get missing edges of graph or you can say i need to find COMPLEMENT of graph.

Upvotes: 4

Views: 243

Answers (1)

yatu
yatu

Reputation: 88236

One possible approach could be as follows:

  • Find all length 2 combinations of the nodes
  • Iterate over them
  • Keep the combinations not contained in G.edges

from itertools import combinations

G = nx.from_pandas_edgelist(df, source='Node1', target='Node2')

edge_non_present = []
edges = set(G.edges())
for possible_edge in combinations(G.nodes(), r=2):
    if possible_edge not in edges:
        edge_non_present.append(possible_edge)

print(edge_non_present)
# [('A', 'C'), ('A', 'D'), ('B', 'D')]

Update

If this leads to a huge number of combinations, due to a high amount of nodes, take a slice of the returned generator, and only iterate on a subset of these:

from itertools import islice

G = nx.from_pandas_edgelist(df, source='Node1', target='Node2')
n_comb = 100

edge_non_present = []
edges = set(G.edges())
for possible_edge in islice(combinations(G.nodes(), r=2), 0, n_comb):
    if possible_edge not in edges:
        edge_non_present.append(possible_edge)

Upvotes: 4

Related Questions