bloo
bloo

Reputation: 326

Pandas - Recursively look for children in dataframe

Consider the below dataframe:

    id1    id2
0   aaa    111
1   bbb    222
2   333    ccc
3   999    zzz
4   ccc    111
5   888    zzz
6   zzz    222
7   ddd    888
8   eee    888

How can I recursively get a dataframe for every match of all the children and all of their grandchildren of a given input, in my case, input = [111, 222]
i.e
Parent1: 111
Child1: aaa
Child2: ccc (from row 4)
Child of Child2: 333 (from row 2)

Parent2: 222
Child1: bbb
Child2: zzz (from row 6)
ChildA of Child2: 888 (from row 5)
ChildB of Child2: 999 (from row 3)
Child_i of ChildA: ddd(from row 8)
Child_ii of ChildA: eee (from row 7)

the expected output for every level (parent->child->child of child) would be:

### for i = 111
# parent level
     id1    id2
0    aaa    111
1    ccc    111

# child level
     id1    id2
0    333    ccc


### for i = 222
# parent level
     id1    id2
0    bbb    222
1    zzz    222

# child level
     id1    id2
0    888    zzz
1    999    zzz

# child of child level
     id1    id2
0    ddd    888    
1    eee    888    

I tried:

parents = [111, 222]

while len(parents) != 0:
    for i in parents:
        children = df[df['id2'].apply(lambda x: i in str(x))][['id1', 'id2']]
        print(children) #print dataframe of match
    parents = children['id1']

but it doesn't go all the way through, I thought of changing i in lambda to a list comprehension but didn't manage to make it work.

Upvotes: 1

Views: 1060

Answers (2)

Serge Ballesta
Serge Ballesta

Reputation: 148965

If you only want to print an indented graph, you could use a simple recursive function:

def desc(i, indent=0):
    print(' '*indent + i)
    for j in df.loc[df['id2'] == i, 'id1']:
        desc(j, indent + 2)

for i in ('111', '222'): desc(i)

With the example df, it gives:

111
  aaa
  ccc
    333
222
  bbb
  zzz
    999
    888
      ddd
      eee

Upvotes: 2

SultanOrazbayev
SultanOrazbayev

Reputation: 16561

The result dataframe will also contain NaNs, but if you want to drop them use result.dropna():

from io import StringIO
d = StringIO("""
ix    id1    id2
0   aaa    111
1   bbb    222
2   333    ccc
3   999    zzz
4   ccc    111
5   888    zzz
6   zzz    222
7   ddd    888
8   eee    888
""")

import pandas as pd

df = pd.read_csv(d, sep='\s+', index_col='ix')

df.columns

result = (
    df.rename(columns={'id2': 'id_parent', 'id1': 'id_child'})
    .merge(df.set_index('id2'), how='left', left_on='id_child', right_index=True)
    .rename(columns={'id1': 'id_grandchild'})
)

result

Here's for example a way to list all grandchildren:

result.dropna().groupby('id_parent')['id_grandchild'].agg(list).reset_index()

Here's a way to create a look-up dictionary that contains all the grandchildren for an individual:

dict_parents = result.dropna().groupby('id_parent')['id_grandchild'].agg(list).to_dict()
# e.g. try: print(dict_parents['222'])

Here's a way to get the result for specific individuals:

specific_ids = ['111', '222']

result = (
    df[df['id2'].isin(specific_ids)].rename(columns={'id2': 'id_parent', 'id1': 'id_child'})
    .merge(df.set_index('id2'), how='left', left_on='id_child', right_index=True)
    .rename(columns={'id1': 'id_grandchild'})
)

result.dropna()

Upvotes: 0

Related Questions