Joohwan
Joohwan

Reputation: 2512

Checking if A is a part of binary tree B

Let's say I have binary trees A and B and I want to know if A is a "part" of B. I am not only talking about subtrees. What I want to know is if B has all the nodes and edges that A does.

My thoughts were that since tree is essentially a graph, and I could view this question as a subgraph isomorphism problem (i.e. checking to see if A is a subgraph of B). But according to wikipedia this is an NP-complete problem.

http://en.wikipedia.org/wiki/Subgraph_isomorphism_problem

I know that you can check if A is a subtree of B or not with O(n) algorithms (e.g. using preorder and inorder traversals to flatten the trees to strings and checking for substrings). I was trying to modify this a little to see if I can also test for just "parts" as well, but to no avail. This is where I'm stuck.

Are there any other ways to view this problem other than using subgraph isomorphism? I'm thinking there must be faster methods since binary trees are much more restricted and simpler versions of graphs.

Thanks in advance!

EDIT: I realized that the worst case for even a brute force method for my question would only take O(m * n), which is polynomial. So I guess this isn't a NP-complete problem after all. Then my next question is, is there an algorithm that is faster than O(m*n)?

Upvotes: 1

Views: 113

Answers (2)

collapsar
collapsar

Reputation: 17238

you could compare the trees in bottom-up as follows:

  • for each leaf in tree A, identify the corresponding node in tree B.
  • start a parallel traversal towards the root in both trees from the nodes just matched. specifically, move to the parent of a node in A and subsequently move towards the root in B until you either encounter the corresponding node in B (proceed) or a marked node in A (see below, if a match in B is found proceed, else fail) or the root of B (fail)
  • mark all nodes visited in A.

you succeed, if you haven't failed ;-).

the main part of the algorithm runs in O(e_B) - in the worst case, all edges in B are visited a constant number of times. the leaf node matching will run in O(n_A * log n_B) if there the B vertices are sorted, O(n_A * log n_A + n_B * log n_B + n) = O(n_B * log n_B) (sort each node set, lienarly scan the results thereafter) otherwise.

EDIT: re-reading your question, abovementioned step 2 is even easier, as for matching nodes in A, B, their parents must match too (otheriwse there would be a mismatch between the edge sets). no effect on worst-case run time, of course.

Upvotes: 0

Vincent van der Weele
Vincent van der Weele

Reputation: 13177

I would approach this problem in two steps:

  1. Find the root of A in B (either BFS of DFS)
  2. Verify that A is contained in B (giving that starting node), using a recursive algorithm, as below (I concocted same crazy pseudo-language, because you didn't specify the language. I think this should be understandable, no matter your background). Note that a is a node from A (initially the root) and b is a node from B (initially the node found in step 1)

function checkTrees(node a, node b) returns boolean
    if a does not exist or b does not exist then
        // base of the recursion
        return false
    else if a is different from b then
        // compare the current nodes
        return false
    else
        // check the children of a
        boolean leftFound = true
        boolean rightFound = true

        if a.left exists then
            // try to match the left child of a with
            // every possible neighbor of b
            leftFound = checkTrees(a.left, b.left)
                       or checkTrees(a.left, b.right)
                       or checkTrees(a.left, b.parent)

        if a.right exists then
            // try to match the right child of a with
            // every possible neighbor of b
            leftFound = checkTrees(a.right, b.left)
                       or checkTrees(a.right, b.right)
                       or checkTrees(a.right, b.parent)

        return leftFound and rightFound

About the running time: let m be the number of nodes in A and n be the number of nodes in B. The search in the first step takes O(n) time. The running time of the second step depends on one crucial assumption I made, but that might be wrong: I assumed that every node of A is equal to at most one node of B. If that is the case, the running time of the second step is O(m) (because you can never search too far in the wrong direction). So the total running time would be O(m + n).

While writing down my assumption, I start to wonder whether that's not oversimplifying your case...

Upvotes: 2

Related Questions