Reputation: 79
Parent) as csv (700,000 rows) input
Child Parent
fA00 f0
fA9 fA0
fA31 fA0
fA30 fA0
fA1 fA00
dccfA1 fA00
fA2 fA00
fA3 fA00
fA01 fA00
fA4 fA00
fA5 fA00
fA6 fA00
fA7 fA00
fA0 fA00
fA142149 fA00
fA02 fA00
fA8 fA00
qA1 fA10
fA22 fA10
fA23 fA10
fA11 fA10
qA2 fA10
fA15 fA11
fA13 fA11
fA12 fA11
fA14 fA13
fA17 fA16
fA18 fA17
fA19 fA17
fA20 fA17
fA21 fA19
etc....
It goes up to 14 levels deep. The top parent is f0
I want to iterate through the child parent relationships to determine the path
Expected Result
f0 --- top
f0\fa00
f0\fa00\.Child
f0\fa00\.Child2etc
f0\fA0
f0\fA0\.Child
f0\fA0\.Child2etc
How can I do this in Python?
Upvotes: 1
Views: 4382
Reputation: 1773
I started out thinking complicated recursive construction of tree structures but basically it is very simple. Create a mapping of child to parent then starting at a child list its parent then the parent's parent up to the top. A recursive routine extracts the child's ancestry easily.
'''
This is the family tree:
------------------------
f0:
a0:
b0
b1:
b2:
a1:
b3:
b4:
a2:
b5:
c0
c1
'''
ancestry = [
('b1', 'a0'),
('c1', 'b5'),
('b2', 'a0'),
('b3', 'a1'),
('b4', 'a1'),
('b5', 'a2'),
('a0', 'f0'),
('a1', 'f0'),
('a2', 'f0'),
('b0', 'a0'),
('c0', 'b5'),
]
And the code is:
parents = set()
children = {}
for c,p in ancestry:
parents.add(p)
children[c] = p
# recursively determine parents until child has no parent
def ancestors(p):
return (ancestors(children[p]) if p in children else []) + [p]
# for each child that has no children print the geneology
for k in (set(children.keys()) - parents):
print '/'.join(ancestors(k))
Output is:
f0/a1/b4
f0/a0/b0
f0/a0/b1
f0/a0/b2
f0/a1/b3
f0/a2/b5/c1
f0/a2/b5/c0
I'll leave it as an exercise to read the csv file, and maybe sort the outputs better.
Upvotes: 5