Eduardo Juarez
Eduardo Juarez

Reputation: 61

Find if the first tree is a subset of the second tree

I checked all around the internet to find out how to check if a tree is a subSet of another. By subset, I mean The issubset function should return 1 if all the elements of the first tree appear in the second tree and 0 otherwise. Note that depending on the insertion order of elements, two trees with same set of elements can have very different shapes. Given the following examples as trees:

Elements of the first tree
             4
           /  \
         2     6
        / \   / \
       1   2  5  7 


Elements of the second Tree
            6
          /   \
         4     7
        /  \
       2    5
      / \
     1   2    

The following code traverses the trees then checks the values:

int issubset(nodeT **tree1, nodeT **tree2) {

    if( *tree2 == NULL)
        return TRUE;
    if(*tree1 == NULL)
        return FALSE;

    if(are_identical(&(*tree1),&(*tree2)))
        return TRUE;

    return issubset(&(*tree1)->pLeft, &(*tree2)) || issubset(&(*tree2)->pRight, &(*tree2));
}

int are_identical(nodeT **tree1, nodeT **tree2) {
    nodeT **temp;
    int iFound = 0, i, r;   
    if( *tree2 == NULL)
        return TRUE;

    if( (*tree1) ==NULL && (*tree2) ==NULL) {
        return FALSE;
    }

    if( (*tree1)->iValue != (*tree2)->iValue) {
        if(iFound = 0)
            return TRUE;

        i = issubset(&(*tree1)->pLeft, &(*tree2));
        if( i ==0) {
            r = issubset(&(*tree1)->pRight, &(*tree2));
            return r;
        }

        return i;
    }   

    return((*tree1)->iValue == (*tree2)->iValue && are_identical(&(*tree1)->pLeft, &(*tree2)->pLeft) && 
           are_identical(&(*tree1)->pRight, &(*tree2)->pRight) );
}

After running my code with the given examples, my output gives back that the first tree is not a subset of the second, when it actually is a subset.

Upvotes: 4

Views: 1209

Answers (2)

lrleon
lrleon

Reputation: 2648

I'm not sure I understand your question, but I still try to give you my answer. From your example I assume that you're working with binary search trees. But I do not know what kind of binary tree you're using, I will assume a general scheme. I guess if the tree is somewhat balanced then maybe you can get a better algorithm.

Since you have binary search trees, you could assume a function search(root, key) which returns a valid pointer to a node containing key if this one is found or NULL otherwise.

Also, I assume that you know the numbers of nodes of each tree. So you could return 0 if tree1 has lesser nodes than tree. Otherwise the approach is as follows:

int tree1_contained_in_tree2(node * tree1, node * tree2)
{
  if (tree1 == NULL) // am I visiting a empty tree?
    return 1;

  // so I am sure tree1 is not NULL ==> I search the contained key in tree2
  if (search(tree2, tree1->key) == NULL)
    return 0; // tree1->key does not belong to tree2

  // here tree1->key belongs to tree1 so, I test with the subtrees of root
  return tree1_contained_in_tree2(tree1->left, tree2) && 
         tree1_contained_in_tree2(tree1->right, tree2);    
}

I prefer to use simple pointers to nodes instead of double pointers. I think you can adapt my approach to yours.

The algorithm is O(n log m) if tree2 is balanced (O(n m) otherwise) where n is the number of nodes of tree1 and m the number of nodes of tree2

Upvotes: 3

augurar
augurar

Reputation: 13016

Binary search trees support efficient sorted iteration over the elements. Maintain an iterator over the elements of each tree in nondecreasing order. Repeat the following until a result is determined:

  • If the first iterator has no more elements, return TRUE.
  • If the second iterator has no more elements, return FALSE.
  • If the current element of the first iterator is less than that of the second, return FALSE.
  • If the current element of the first iterator is equal to that of the second, update both iterators.
  • If the current element of the first tree is greater than that of the second, update the second iterator. (You can optimize this by skipping some elements.)

The basic implementation is worst case O(n + m) where n and m are the respective sizes of the two trees. With the optimization mentioned you can additionally bound this by O(n log m) if the larger tree is balanced, which is useful if the second tree is much larger than the first. (Whether or not the trees are balanced, the O(n + m) bound still applies.)

Upvotes: 1

Related Questions