loremIpsum1771
loremIpsum1771

Reputation: 2527

Running time to check if a binary tree is subtree of another binary tree

I've come across a naive solution for the problem of checking if a binary tree is subtree of another binary tree:

Given two binary trees, check if the first tree is subtree of the second one. A subtree of a tree T is a tree S consisting of a node in T and all of its descendants in T. The subtree corresponding to the root node is the entire tree; the subtree corresponding to any other node is called a proper subtree.

For example, in the following case, tree S is a subtree of tree T:

 Tree 2

      10  
    /    \ 
  4       6
   \
    30

    Tree 1
          26
        /   \
      10     3
    /    \     \
  4       6      3
   \
    30

The solution is to traverse the tree T in preorder fashion. For every visited node in the traversal, see if the subtree rooted with this node is identical to S.

It is said in the post that the algorithm has a running time of n^2 or O(m*n) in the worst case where m and n are the sizes of both trees involved.

The point of confusion here is that, if we are traversing through both trees at the same time, in the worst case, it would seem that you would simply have to recurse through all of the nodes in the larger tree to find the subtree. So how could this version of the algorithm (not this one) have a quadratic running time?

Upvotes: 1

Views: 717

Answers (2)

mtszkw
mtszkw

Reputation: 2783

Well, basically in the isSubTree() function you only traverse T tree (the main one, not a subtree). You do nothing with S, so in the worst case this function would be executed for every node in T. However (in the worst case) for each execution, it will check if areIdentical(T, S), which in the worst case has to fully traverse one of the given trees (till one of those is zero-sized).

Trees passed to areIdentical() function are obviously smaller and smaller, but in this case it doesn't matter if it comes to time complexity. Either way this gives you O(n^2) or O(n*m) (where n,m - number of nodes in those trees).

Upvotes: 2

Malcolm McLean
Malcolm McLean

Reputation: 6406

To solve reasonably optimally, flatten the two trees. Using Lisp notation, we get

(10 (4(30) (6))

and

(26 (10 (4(30) (6)) (3 (3))

So the subtree is a substring of the parent. Using strstr we can complete normally in O(N) time, it might take a little bit longer if we have lots and lots of near sub-trees. You can use a suffix tree if you need to do lots of searches and that gets it down to O(M) time where M is the size of the subtree.

But actually runtime doesn't improve. It's the same algorithm, and it will have N M behaviour if, for example, all the trees have the same node id and structure, except for the last right child of the query sub-tree. it's just that the operations become a lot faster.

Upvotes: 0

Related Questions