Reputation: 137
I have a pandas dataframe Edges with two columns Node 1 | Node 2
Node1 Node2
0 A B
1 C B
2 C D
These basically show edges (A,B) | (C,B) | (C,D) in a graph
What i need to find out is Missing edges i.e not present (A,D) | (B,D) | (A,C)
Desired output
Node1 Node2
0 A D
1 A C
2 B D
What i tried:
i converted DataFrame into networkx graph and then used nx.non_edges function for same purpose ( to find missing edges) but due to low hardware resources networkx fills up the RAM and notebook crashes . I'm trying to find if it is possible through pandas Dataframe that i could get missing edges of graph or you can say i need to find COMPLEMENT of graph.
Upvotes: 4
Views: 243
Reputation: 88236
One possible approach could be as follows:
2
combinations of the nodesG.edges
from itertools import combinations
G = nx.from_pandas_edgelist(df, source='Node1', target='Node2')
edge_non_present = []
edges = set(G.edges())
for possible_edge in combinations(G.nodes(), r=2):
if possible_edge not in edges:
edge_non_present.append(possible_edge)
print(edge_non_present)
# [('A', 'C'), ('A', 'D'), ('B', 'D')]
Update
If this leads to a huge number of combinations, due to a high amount of nodes, take a slice of the returned generator, and only iterate on a subset of these:
from itertools import islice
G = nx.from_pandas_edgelist(df, source='Node1', target='Node2')
n_comb = 100
edge_non_present = []
edges = set(G.edges())
for possible_edge in islice(combinations(G.nodes(), r=2), 0, n_comb):
if possible_edge not in edges:
edge_non_present.append(possible_edge)
Upvotes: 4