Finding all descendandts in tree

I have a df like:

d = {'Parent': ['abc', 'abc', 'def', 'mno'], 'Child': ['def', 'ghi', 'jkl', 'pqr']}
df = pd.DataFrame(data=d)

and would like to get a df like:

d2 = {'Ancestor': ['abc', 'abc', 'abc', 'mno'], 'Descendant': ['def', 'ghi', 'jkl', 'pqr']}
df2 = pd.DataFrame(data = d2)

where abc and mno are the only ancestors and the rest are listed as descendants to their respective ancestor.

So far I have tried networkx but without any luck.

EDIT: example only showing three tiers but tree structure can be any number of tiers.

Upvotes: 0

Views: 50

Answers (1)

Scott Boston
Scott Boston

Reputation: 153500

I think you can do this using newtorkx with directed graphs:

import pandas as pd
import networkx as nx

d = {'Parent': ['abc', 'abc', 'def', 'mno'], 'Child': ['def', 'ghi', 'jkl', 'pqr']}
df = pd.DataFrame(data=d)
dG = nx.from_pandas_edgelist(df, 'Parent', 'Child', create_using=nx.DiGraph())
df2 = pd.DataFrame({'Ancenstor':[[i for i in nx.ancestors(dG,i) if i not in df['Child'].tolist()][0] for i in df.Child],
          'Descendent':df['Child']})

df2 

Output:

  Ancenstor Descendent
0       abc        def
1       abc        ghi
2       abc        jkl
3       mno        pqr

Upvotes: 2

Related Questions