aviad
aviad

Reputation: 8278

The most efficient way to test two binary trees for equality

How would you implement in Java the binary tree node class and the binary tree class to support the most efficient (from run-time perspective) equal check method (also has to be implemented):

    boolean equal(Node<T> root1, Node<T> root2) {}

or

    boolean equal(Tree t1, Tree t2) {}

First, I created the Node class as follows:

    public class Node<T> {
        private Node<T> left;
        private Node<T> right;
        private T data;

        // standard getters and setters
    }

and then the equals method that takes 2 root nodes as an arguments and runs the standard recursive comparison:

    public boolean equals(Node<T> root1, Node<T> root2) {
        boolean rootEqual = false;
        boolean lEqual = false;
        boolean rEqual = false;    

        if (root1 != null && root2 != null) {
            rootEqual = root1.getData().equals(root2.getData());

            if (root1.getLeft()!=null && root2.getLeft() != null) {
                // compare the left
                lEqual = equals(root1.getLeft(), root2.getLeft());
            }
            else if (root1.getLeft() == null && root2.getLeft() == null) {
                lEqual = true;
            }
            if (root1.getRight() != null && root2.getRight() != null) {
                // compare the right
                rEqual = equals(root1.getRight(), root2.getRight());
            }
            else if (root1.getRight() == null && root2.getRight() == null) {
                rEqual = true;
            }

            return (rootEqual && lEqual && rEqual);
        }
        return false;
    } 

My second attempt was to implement the trees using arrays and indexes for traversing. Then the comparison could be done using the bitwise operations (AND) on two arrays - read chunk from 2 arrays and mask one by another using logical AND. I failed to get my code working so I do not post it here (I'd appreciate your implementation of the second idea as well as your improvements).

Any thoughts how to do equality test for binary trees most efficiently?

EDIT

The question assumes structural equality. (Not the semantic equality)

However, code that tests the semantic equality e.g. "Should we consider the two trees to be equal if their contents are identical, even if their structure is not?" Would be just iterating over the tree in-order and it should be straightforward.

Upvotes: 19

Views: 43418

Answers (5)

Gerry Howser
Gerry Howser

Reputation: 1

The code given above would return true for two unequal trees with the same root values. I don't think this is what you want. Shouldn't it be:

if (!a==b) return false;

That way the method would perform the rest of the checks.

(Can't log in from here for some reason.)

Upvotes: 0

mikera
mikera

Reputation: 106351

First of all I'm making a few general assumptions. These are assumptions that are valid for most tree-based collection classes but it's always worth checking:

  1. You consider two trees to be equal if and only if they are equal both in terms of tree structure and in terms of data values at each node (as defined by data.equals(...))
  2. null data values are allowed at tree nodes (this could be either because you allow null explicitly or because your data structure only stores non-null values at leaf nodes)
  3. There aren't any particular unusual facts you know about the distribution of data values that you can take advantage of (for example, if you knew that the only possible data vales were null or the String "foo", then you don't need to compare two non-null String values)
  4. The trees will typically be of moderate size and reasonably well balanced. In particular, this ensures that the trees will never be so deep that you run the risk of StackOverflowExceptions caused by deep recursion.

Assuming these assumptions are correct, then the approach I would suggest is:

  • Do root reference equality check first. this quickly eliminates the case of either two nulls or the same tree being passed in for comparison with itself. Both are very common cases, and the reference equality check is extremely cheap.
  • Check the nulls next. Non-null is obviously not equal to null, which enables you to bail out early plus it establishes a non-null guarantee for later code! A very smart compiler could also theoretically use this guarantee to optimise away null pointer checks later (not sure if the JVM currently does this)
  • Check data reference equality and nulls next. This avoids descending all the way down the tree branches which you would do even in the case of unequal data if you went down the tree branches first.
  • Check data.equals() next. Again you want to check data equality before tree branches. You do this after checking for nulls since data.equals() is potentially more expensive and you want to guarantee you won't get a NullPointerException
  • Check the equality of branches recursively as the last step. It doesn't matter if you do left or right first unless there is a greater likelihood of one side being unequal, in which case you should check that side first. This might be the case if e.g. most changes were being appended to the right branch of the tree....
  • Make the comparison a static method. This is because you want to use it recursively in a way that will accept nulls as either of the two parameters (hence it isn't suitable for an instance method as this cannot be null). In addition, the JVM is very good at optimising static methods.

My implementation would therefore be something like:

public static boolean treeEquals(Node a, Node b) {
    // check for reference equality and nulls
    if (a == b) return true; // note this picks up case of two nulls
    if (a == null) return false;
    if (b == null) return false;

    // check for data inequality
    if (a.data != b.data) {
        if ((a.data == null) || (b.data == null)) return false;
        if (!(a.data.equals(b.data))) return false;
    }

    // recursively check branches
    if (!treeEquals(a.left, b.left)) return false;
    if (!treeEquals(a.right, b.right)) return false;

    // we've eliminated all possibilities for non-equality, so trees must be equal
    return true;
}

Upvotes: 22

Jon Skeet
Jon Skeet

Reputation: 1500165

Well for one thing you're always checking the branches, even if you spot that the roots are unequal. Your code would be simpler (IMO) and more efficient if you just returned false as soon as you spotted an inequality.

Another option to simplify things is to allow your equals method to accept null values and compare two nulls as being equal. That way you can avoid all those nullity checks in the different branches. This won't make it more efficient, but it'll be simpler:

public boolean equals(Node<T> root1, Node<T> root2) {
    // Shortcut for reference equality; also handles equals(null, null)
    if (root1 == root2) {
        return true;
    }
    if (root1 == null || root2 == null) {
        return false;
    }
    return root1.getData().equals(root2.getData()) &&
           equals(root1.getLeft(), root2.getLeft()) &&
           equals(root1.getRight(), root2.getRight());
} 

Note that currently this will fail if root1.getData() returns null. (That may or may not be possible with the way you're adding nodes.)

EDIT: As discussed in comments, you could use hash codes to make a very quick "early out" - but it would add complexity.

Either you need to make your trees immutable (which is a whole other discussion) or you need each node to know about its parent, so that when the node is changed (e.g. by adding a leaf or changing the value) it needs to update its hash code and ask its parent to update too.

Upvotes: 34

Hounshell
Hounshell

Reputation: 5459

Out of curiosity, do you consider the two trees to be equal if their contents are identical, even if their structure is not? For example, are these equal?

  B         C        C      A
 / \       / \      / \      \
A   D     B   D    A   D      B
   /     /          \          \
  C     A            B          C
                                 \
                                  D

These trees have the same contents in the same order, but because the structures are different, by your tests would not be equal.

If you want to test this equality, personally I'd just build an iterator for the tree using in-order traversal and iterate through the trees comparing them element by element.

Upvotes: 27

Ivaylo Strandjev
Ivaylo Strandjev

Reputation: 70929

For any tree the most efficient way to represent it so that you can easily check for equality is the parent list - hold an array in which for each vertex you remember the index of its parent(actually hold a pair - the index of the father and the data value). Then you simply should do a compare of two continuous blocks of memory.

This will only work if the tree is static(i.e does not change over time). Also it will only consider the trees to be equal if the vertices indexes are the same in the two trees.

I believe in the common case when the two statements above are not true, your implementation should be about as fast as you can get.

EDIT: in fact your implementation can be improved if you follow the advises in Jon Skeet's answer(at least return false as soon as you know the trees are not equal)

Upvotes: 3

Related Questions