KnowNothing
KnowNothing

Reputation: 33

Find all possible connections between child and parent

I am looking for all the connections through the hierarchy from so called childs to parents. It is not entirely child-parent hierarchy, cause there could be many connections between one child and one parent through the “intermediate_parents”. And also one child can be linked to few parents, same vice versa. For example:

Child Parent Class
1 2 C
1 3 C
1 4 B
1 9 A
2 4 B
2 6 D
2 8 S
3 6 C
4 5 A
5 6 D

The outcome will be following:

Child Ultimate_Parent Class Path Connection
1 6 C 1-2-4-5-6 Indirect
1 6 C 1-2-6 Indirect
1 8 C 1-2-8 Indirect
1 6 C 1-3-6 Indirect
1 6 B 1-4-5-6 Indirect
1 9 A 1-9 Direct
2 6 B 2-4-5-6 Indirect
2 6 D 2-6 Direct
2 8 S 2-8 Direct
3 6 C 3-6 Direct
4 6 A 4-5-6 Indirect
5 6 D 5-6 Direct

Input in df:

df = pd.DataFrame({'Child': ['1', '1', '1', '1', '2', '2', '2', '3', '4', '5'], 
                   'Parent': ['2','3','4','9','4','6','8','6','5','6'],
                   'Class': ['C','C','B','A','B','D','S','C','A','D']})  

I first tried to resolve it with DiR graph, but still struggling to understand completely how it works. I already asked here Finding ultimate parent for help but wasn't entirely correct with my formulation of the question. And the answer is correct for that one, but I can't extrapolate it to this case.

Upvotes: 0

Views: 202

Answers (1)

Kota Mori
Kota Mori

Reputation: 6750

You can expand the data from the nodes with no parents. The code below is an example fully using pandas methods and produces the required output. If the speed is an issue, then you may be able to implement more efficiently without pandas because pandas is not a best tool for this type of recursive operations.

import pandas as pd

df = pd.DataFrame(
  {'Child': ['1', '1', '1', '1', '2', '2', '2', '3', '4', '5'], 
   'Parent': ['2','3','4','9','4','6','8','6','5','6'],
   'Class': ['C','C','B','A','B','D','S','C','A','D']}) 

out = None
while len(df) > 0:
  no_parent = set(df.Parent) - set(df.Child)
  tmp = df[df.Parent.isin(no_parent)]
  if out is None:
    out = tmp.rename(columns={"Child": "Parent", "Parent": "UltimateParent"})
    out["Path"] = out.Parent + "-" + out.UltimateParent
    out["Connection"] = "Direct"
  else:
    tmp = pd.merge(out, tmp, on="Parent", how="inner")
    tmp.Path = tmp.Child + "-" + tmp.Path
    tmp["Class"] = tmp.Class_y.fillna(tmp.Class_x)
    tmp["Parent"] = tmp.Child.fillna(tmp.Parent)
    tmp = tmp.drop(columns=["Child", "Class_x", "Class_y"])
    tmp["Connection"] = "Indirect"
    out = pd.concat((out, tmp), ignore_index=True)
  df = df[~df.Parent.isin(no_parent)]
out = out.rename(columns={"Parent": "Child"}).sort_values("Path")
out
 Child UltimateParent Class       Path Connection
     1              6     C  1-2-4-5-6   Indirect
     1              6     C      1-2-6   Indirect
     1              8     C      1-2-8   Indirect
     1              6     C      1-3-6   Indirect
     1              6     B    1-4-5-6   Indirect
     1              9     A        1-9     Direct
     2              6     B    2-4-5-6   Indirect
     2              6     D        2-6     Direct
     2              8     S        2-8     Direct
     3              6     C        3-6     Direct
     4              6     A      4-5-6   Indirect
     5              6     D        5-6     Direct

Upvotes: 2

Related Questions