Reputation: 328
I have a dataframe with two columns as:
CHILD PARENT
1 2
2 3
3 4
10 11
11 12
I need to create a dictionary where I keep the top parent as key and all its descendents as set of values, like:
4: [1,2,3]
12: [10,11]
I have been able to extract 12 and 4 as top parents from this dataframe, by code from following link:
extract column value based on another column pandas dataframe
Now, I have no idea how to do this in python. In java I can do this by following a dfs. Any suggestions?
Upvotes: 1
Views: 2164
Reputation: 2188
Here is a BFS approach not based on networkx
which is a great Python package, though not part of the Python standard library.
Code:
from collections import defaultdict
import pandas as pd
df = pd.DataFrame(data=[[1, 2], [2, 3], [3, 4], [10, 11], [11, 12]],
columns=['CHILD', 'PARENT'])
# build graph
graph = defaultdict(set)
for child, parent in df[['CHILD', 'PARENT']].values:
graph[parent].add(child)
# identity root nodes
roots = []
for node in graph.keys():
if all(node not in children for children in graph.values()):
roots.append(node)
# find the descendents of each root node
result = {}
for root in roots:
visited = set()
unvisited = graph[root]
while unvisited:
visited |= unvisited
unvisited = set.union(*(graph[node] for node in unvisited)) - visited
result[root] = visited
print(result)
Output:
{4: {1, 2, 3}, 12: {10, 11}}
Upvotes: 1
Reputation: 323306
Here is the way from networkx
import networkx as nx
G=nx.from_pandas_edgelist(df, 'CHILD', 'PARENT')
l=list(nx.connected_components(G))
L=[dict.fromkeys(y,x) for x, y in enumerate(l)]
d={k: v for d in L for k, v in d.items()}
df.groupby(df.CHILD.map(d)).agg({'CHILD':'unique','PARENT':'max'})
Out[328]:
PARENT CHILD
CHILD
0 4 [1, 2, 3]
1 12 [10, 11]
Upvotes: 3