Stefan Seemayer
Stefan Seemayer

Reputation: 2067

How to get all parents for all leaves in a tree quickly?

I have a possibly large rooted tree structure that I want to transform into a X * Y matrix with X being the amount of leaves in the tree and Y being the amount of nodes in the tree with a degree larger than 1, i.e. the root node and internal nodes. The matrix should be filled as such:

Mi,j = { 0 if leaf i has ancestor j, 1 otherwise

For example, this tree:

      --A 
     /
    1   B
   / \ /
  /   3  
 /     \
0       C
 \ 
  \   --D
   \ /
    2
     \--E

would translate into this matrix:

  0 1 2 3
A T T F F
B T T F T
C T T F T
D T F T F
E T F T F

Since the trees might become pretty big (possibly ~100,000 leaves), I was wondering if there is a smarter/faster way of doing this than traversing up the tree for every one of the leaf nodes. It feels like some kind of algorithm is in this problem somewhere, but I haven't figured it out yet. Maybe someone can help?

In my application, the tree represents large phylogenetic hierarchies, so it is not balanced and there might be nodes with more than two children.

Upvotes: 2

Views: 2227

Answers (1)

amit
amit

Reputation: 178411

I'd go with post-order traversal.

Maintain lists of leaves while traversing the tree, and in each level - the list will contain all the leaves up to this level.

decalrations for functions we will use:

list merge(list1,list2) //merges two lists and creates a new list
list create() // creates a new empty list
void add(list,e) // appends e to the list
void setParent(leaf,node) //sets (in your matrix) node as a parent of leaf

Pseudo code:

list Traverse(root):
  if (root == nil):
      l <- create()
      return l
  else if (root is leaf):
      l <- create()
      add(l,root)
      return l
  else: 
      l1 <- Traverse(root.left)
      l2 <- Traverse(root.right)
      l <- merge(l1,l2)
      for each leaf in l:
          setParent(leaf,root)
      return l

Time is O(n*m) - for setting the matrix (though the algorithm itself is O(nlogn) time for a balanced tree).

If you want to prevent the O(n*m) initialization you can initialize the matrix in O(1), and then run the algorithm above in O(nlogn). Though it will give better asymptotic complexity, I doubt it will actually be faster.

Upvotes: 1

Related Questions