Easier way to find nodes in a dataframe

Question

Given a parent, I would like to get its nodes, of level N (original dataframe is way larger)

child|   level|  parent|
   40|       1|      40|
   21|       2|      40|
   18|       2|      40|
   37|       3|      21|
    2|       3|      18|
   85|       3|      21|
   14|       4|      37|
   58|       4|       2|
   47|       4|      37|
   34|       4|      85|
   45|       4|      18|
   32|       4|       2|
   47|       4|      85|
   88|       4|      85|
   12|       4|      37|

What I do:

def get_children(x, tree, lst):
    for nod in tree[tree['parent'] == x]['child']:
        get_children(nod, tree, lst)
        lst.append(nod)

and then I filter all nodes of level N.

I would like another way because there are too much recursion calls when the dataframe is larger.

mrhd · Accepted Answer

Using an auxiliary data structure you could solve this task in two steps:

store all predecessors of each node
for each node of level N, check if node x is in the list of predecessors

Without further assumptions (e.g. that the adjacency list is ordered, and higher level nodes cannot be parents of lower level ones), you would need to cycle through the list at least twice (once to create the parents index, then once more for each lookup). But if the aforementioned assumptions can be made, a single loop would be sufficient.

# determine all predecessors of each node in tree
def index_parents(tree):
  parents = {}
  for i, n in tree.iterrows():
    child = int(n['child']) 
    parent = int(n['parent'])
    # if this child node is seen for the first time, initialize parents
    if child not in parents: 
      parents[child] = set()
    # add current parent to predecessor list
    parents[child].update(set([parent]))
    # add predecessors of parent to predecessor list
    parents[child].update(parents[parent])

  return parents

# check all nodes at given level if they contain x as predecessor
def get_children(x, tree, level, parents):
  x_children = []
  for i, n in tree.iterrows():
    child = int(n['child'])
    if n['level'] == level and x in parents[child]:
      # x is in predecessors of child, add it to the result
      x_children.append(child)

  return x_children

A test given you test data:

>>> parents = index_parents(tree)
>>> get_children(21, tree, 3, parents)                                                                                    
>>> [37, 85]

It may seem a bit backwards, but it avoids the recursion.

Easier way to find nodes in a dataframe

Answers (1)

Related Questions