Haskar P
Haskar P

Reputation: 420

How to generate all trees having n-nodes and m-level depth? The branching factor is variable and need not be constant within the tree itself

Quite straightforward question I believe.^

For example a 5 node 3 level algorithm would eliminate

 A
 |
 B
 |
 C
 |
 D
 |
 E 

but not

    A
   /|\
  B C D
       \
        E

which is the same as (atleast to me)

    A
   /|\
  D C B
 /
E

For example a 3-Node 2-Level algorithm would generate the following only:

   (1)               (2)             (3)

    A                 B               C
   / \        or     / \       or    / \ 
  B   C             A   C           A   B

Notice that I consider the tree below to be the same as (2) above:

       B
      / \
     C   A

Upvotes: 2

Views: 593

Answers (1)

rici
rici

Reputation: 241861

The basic structure of the solution will be recursive, although the precise details vary depending on the precise equivalence classes you are assuming for the generation. Here you'll need to consider questions such as whether the nodes are labelled or not, and whether the children of a node are ordered or not. Each of these choices defines a very different but equally interesting generation problem.

The essential algorithm is to figure out an enumeration of possible sets of children, obeying the equivalence classes defined for the particular problem by producing only a canonical entry for each equivalence class. For unlabelled trees with ordered children, for example, we might start by enumerating all the ordered partitions of the node count; for a given partition, we would form the Cartesian product of all the different possible subtrees of the indicated sizes. If we dodn't care about the order of the children, we'd need to find some way of making canonical lists of children: we could first sort the subtrees by total size, but then we'd need a canonical order for two subtrees of the same size. This gets tricky, but there are solutions.

I gather that your problem does not just have a number of nodes but rather specific node labels, and therefore two trees with the same shape but different labellings are to be considered distinct. However, you want the order of children to be irrelevant to the comparison. The fact that all nodes have unique labels makes the problem quite tractable: we can enumerate non-empty subsets of the available labels because we know that the trees generated with one subset of the labels must all be distinct from the trees generated with a different subset. Since the subtrees themselves are unordered, we can canonicalise by keeping the subtree roots sorted (any ordering is as good as any other).

So we end up with the procedure GenTrees(R, D, N), where R is the root label, D is the set of descendant labels, and N is the maximum tree depth, which can be defined as follows:

  • If D is empty, return a singleton set with the tree consisting only of the indicated root node.

  • (For efficiency) If N is 1, return a singleton set with the tree consisting of the indicated root node with the remaining nodes as leaves. (The less efficient step is "If N is 0, return the empty set." But it's better to shortcut by checking for N == 1.)

  • Otherwise, start with an empty set of possible trees, and

  • For every non-empty subset S of D (these are the children of the root node):

    • For every ordered partition P of D - S into |S| subsets (these are the remaining descendants of each subtree):

      • Add to the return set all trees with root R whose children are in GenTrees(Si, Pi, N-1). (This is a cartesian product. Here, the ordering of S is arbitrary but must be consistent; sorting the labels into ascending order would be an obvious solution.)

By way of testing the above a little, I wrote a sample implementation in Python. It makes no attempt to be efficient, and is basically a transcription of the above using python generators. (The advantage of generators is that we don't have to fill memory with the millions of possible trees; we just generate one at a time.)

In case it is clearer than the verbiage, here it is:

from functools import reduce
from itertools import product

class tree(object):
  def __init__(self, root, kids=()):
    self.root = root
    self.kids = tuple(kids)

  def __str__(self):
    '''Produce an S-expr for the tree.'''
    if self.kids:
      return "(%s %s)" % (self.root,
                          ' '.join(str(kid) for kid in self.kids))
    else:
      return self.root

def append(part, wherewhat):
  part[wherewhat[0]].append(wherewhat[1])
  return part

def gentrees(root, labels, N):
  if not labels or N <= 1:
    yield tree(root, labels if N else ())
  else:
    for p in product((0,1), repeat = len(labels)):
      if any(p):
        nexts, roots = reduce(append, zip(p, labels), ([], []))
        for q in product(range(len(roots)), repeat = len(nexts)):
          kidsets = reduce(append, zip(q, nexts), tuple([] for i in roots))
          yield from (tree(root, kids)
                      for kids in product(*(gentrees(root, rest, N-1)
                                            for root, rest in zip(roots, kidsets))))    

def alltrees(labels, N):
  for i, root in enumerate(labels):
    yield from gentrees(root, labels[:i] + labels[i+1:], N)

The last function iterates over all possible roots and calls gentrees with each one. Here's a sample output, for three nodes and maximum depth 2 (this function counts depth in links, not nodes):

>>> print('\n'.join(map(str,alltrees("ABC",2))))
(A (C B))
(A (B C))
(A B C)
(B (C A))
(B (A C))
(B A C)
(C (B A))
(C (A B))
(C A B)

Worth noting that the count of generated trees increases exponentially with the number of node labels. Unless you are limiting yourself to very small trees, you should probably make some attempt to prune the generation recursion whenever possible (top-down or bottom-up or both).

Upvotes: 1

Related Questions