Reputation: 33
I am looking for all the connections through the hierarchy from so called childs to parents. It is not entirely child-parent hierarchy, cause there could be many connections between one child and one parent through the “intermediate_parents”. And also one child can be linked to few parents, same vice versa. For example:
Child | Parent | Class |
---|---|---|
1 | 2 | C |
1 | 3 | C |
1 | 4 | B |
1 | 9 | A |
2 | 4 | B |
2 | 6 | D |
2 | 8 | S |
3 | 6 | C |
4 | 5 | A |
5 | 6 | D |
The outcome will be following:
Child | Ultimate_Parent | Class | Path | Connection |
---|---|---|---|---|
1 | 6 | C | 1-2-4-5-6 | Indirect |
1 | 6 | C | 1-2-6 | Indirect |
1 | 8 | C | 1-2-8 | Indirect |
1 | 6 | C | 1-3-6 | Indirect |
1 | 6 | B | 1-4-5-6 | Indirect |
1 | 9 | A | 1-9 | Direct |
2 | 6 | B | 2-4-5-6 | Indirect |
2 | 6 | D | 2-6 | Direct |
2 | 8 | S | 2-8 | Direct |
3 | 6 | C | 3-6 | Direct |
4 | 6 | A | 4-5-6 | Indirect |
5 | 6 | D | 5-6 | Direct |
Input in df:
df = pd.DataFrame({'Child': ['1', '1', '1', '1', '2', '2', '2', '3', '4', '5'],
'Parent': ['2','3','4','9','4','6','8','6','5','6'],
'Class': ['C','C','B','A','B','D','S','C','A','D']})
I first tried to resolve it with DiR graph, but still struggling to understand completely how it works. I already asked here Finding ultimate parent for help but wasn't entirely correct with my formulation of the question. And the answer is correct for that one, but I can't extrapolate it to this case.
Upvotes: 0
Views: 202
Reputation: 6750
You can expand the data from the nodes with no parents. The code below is an example fully using pandas methods and produces the required output. If the speed is an issue, then you may be able to implement more efficiently without pandas because pandas is not a best tool for this type of recursive operations.
import pandas as pd
df = pd.DataFrame(
{'Child': ['1', '1', '1', '1', '2', '2', '2', '3', '4', '5'],
'Parent': ['2','3','4','9','4','6','8','6','5','6'],
'Class': ['C','C','B','A','B','D','S','C','A','D']})
out = None
while len(df) > 0:
no_parent = set(df.Parent) - set(df.Child)
tmp = df[df.Parent.isin(no_parent)]
if out is None:
out = tmp.rename(columns={"Child": "Parent", "Parent": "UltimateParent"})
out["Path"] = out.Parent + "-" + out.UltimateParent
out["Connection"] = "Direct"
else:
tmp = pd.merge(out, tmp, on="Parent", how="inner")
tmp.Path = tmp.Child + "-" + tmp.Path
tmp["Class"] = tmp.Class_y.fillna(tmp.Class_x)
tmp["Parent"] = tmp.Child.fillna(tmp.Parent)
tmp = tmp.drop(columns=["Child", "Class_x", "Class_y"])
tmp["Connection"] = "Indirect"
out = pd.concat((out, tmp), ignore_index=True)
df = df[~df.Parent.isin(no_parent)]
out = out.rename(columns={"Parent": "Child"}).sort_values("Path")
out
Child UltimateParent Class Path Connection
1 6 C 1-2-4-5-6 Indirect
1 6 C 1-2-6 Indirect
1 8 C 1-2-8 Indirect
1 6 C 1-3-6 Indirect
1 6 B 1-4-5-6 Indirect
1 9 A 1-9 Direct
2 6 B 2-4-5-6 Indirect
2 6 D 2-6 Direct
2 8 S 2-8 Direct
3 6 C 3-6 Direct
4 6 A 4-5-6 Indirect
5 6 D 5-6 Direct
Upvotes: 2