Feng Chen
Feng Chen

Reputation: 2253

how to find all the paths on a tree (or multiple connected trees) in Python?

I am trying to find all the paths from one, or more connected trees using Python, For example, if my data is:

a=pd.DataFrame({'predecessor':[1,2,1,4,5,5,5,7,7,10,11,11,8,8,8,14,14,14,16,16,21,16,15,15],
                'successor':[2,3,4,5,6,7,8,9,10,11,12,13,17,18,19,8,15,16,20,21,23,22,19,20]})

predecessor and successor means the two numbers are linked. so, my trees using this data will look like:

enter image description here

what I want to have is all the paths. One path is like [1,2,3], or [1,4,5,7,10,11,13]. My real data is huge, so using a data frame to store all the paths is not a good idea. Maybe a list of lists, in which every sub list is a complete path, is useful. I hope the result is like:

[[1,2,3], 
 [1,4,5,7,10,11,13],
 [14,8,17],
 [14,16,21,23],
 ......]

So, could anyone help me out here?

Upvotes: 1

Views: 1120

Answers (1)

Jack Song
Jack Song

Reputation: 478

import pandas as pd


a = pd.DataFrame({'predecessor': [1, 2, 1, 4, 5, 5, 5, 7, 7, 10, 11, 11, 8, 8, 8, 14, 14, 14, 16, 16, 21, 16, 15, 15],
                  'successor': [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 17, 18, 19, 8, 15, 16, 20, 21, 23, 22, 19, 20]})

# loop to store all the parent-child nodes and find out the root nodes and end nodes.
# A root node is a node only in 'predecessor' but not in 'successor'
# An end node is a node only in 'successor' but not in 'predecessor'
root_nodes = set()
end_nodes = set()
node_relations = {}
for i in range(len(a['predecessor'])):
    predecessor = a['predecessor'][i]
    successor = a['successor'][i]
    if predecessor not in node_relations.keys():
        node_relations[predecessor] = []
    node_relations[predecessor].append(successor)
    if predecessor not in a['successor'].values:
        root_nodes.add(predecessor)
    if successor not in a['predecessor'].values:
        end_nodes.add(successor)

# DFS + Memorization
def get_routes(root, memory):
    # when already in memory
    if root in memory.keys():
        return memory[root]
    # when it is the end node, return node itself as the routes
    if root in end_nodes:
        memory[root] = [[root]]
        return memory[root]
    # Loop all the successor routes and add root node before all of them
    memory[root] = []
    for successor in node_relations[root]:
        for route in get_routes(successor, memory):
            memory[root].append([root] + route)
    return memory[root]

# Loop from root nodes
memory = {}
result = []
for root in root_nodes:
    result.extend(get_routes(root, memory))

Upvotes: 1

Related Questions