Reputation: 2253
I am trying to find all the paths from one, or more connected trees using Python, For example, if my data is:
a=pd.DataFrame({'predecessor':[1,2,1,4,5,5,5,7,7,10,11,11,8,8,8,14,14,14,16,16,21,16,15,15],
'successor':[2,3,4,5,6,7,8,9,10,11,12,13,17,18,19,8,15,16,20,21,23,22,19,20]})
predecessor and successor means the two numbers are linked. so, my trees using this data will look like:
what I want to have is all the paths. One path is like [1,2,3], or [1,4,5,7,10,11,13]. My real data is huge, so using a data frame to store all the paths is not a good idea. Maybe a list of lists, in which every sub list is a complete path, is useful. I hope the result is like:
[[1,2,3],
[1,4,5,7,10,11,13],
[14,8,17],
[14,16,21,23],
......]
So, could anyone help me out here?
Upvotes: 1
Views: 1120
Reputation: 478
import pandas as pd
a = pd.DataFrame({'predecessor': [1, 2, 1, 4, 5, 5, 5, 7, 7, 10, 11, 11, 8, 8, 8, 14, 14, 14, 16, 16, 21, 16, 15, 15],
'successor': [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 17, 18, 19, 8, 15, 16, 20, 21, 23, 22, 19, 20]})
# loop to store all the parent-child nodes and find out the root nodes and end nodes.
# A root node is a node only in 'predecessor' but not in 'successor'
# An end node is a node only in 'successor' but not in 'predecessor'
root_nodes = set()
end_nodes = set()
node_relations = {}
for i in range(len(a['predecessor'])):
predecessor = a['predecessor'][i]
successor = a['successor'][i]
if predecessor not in node_relations.keys():
node_relations[predecessor] = []
node_relations[predecessor].append(successor)
if predecessor not in a['successor'].values:
root_nodes.add(predecessor)
if successor not in a['predecessor'].values:
end_nodes.add(successor)
# DFS + Memorization
def get_routes(root, memory):
# when already in memory
if root in memory.keys():
return memory[root]
# when it is the end node, return node itself as the routes
if root in end_nodes:
memory[root] = [[root]]
return memory[root]
# Loop all the successor routes and add root node before all of them
memory[root] = []
for successor in node_relations[root]:
for route in get_routes(successor, memory):
memory[root].append([root] + route)
return memory[root]
# Loop from root nodes
memory = {}
result = []
for root in root_nodes:
result.extend(get_routes(root, memory))
Upvotes: 1