dynamitem
dynamitem

Reputation: 1669

Time complexity of Arrays.deepEquals() in Java

I'm trying to figure out the time complexity for the java.util.Arrays deepEquals() method.

I could understand from the source code that equals() method runs in O(n) time, but it's not really that clear to deduce the time complexity from the deepEquals() method. It runs in a loop, but also calls the deepEquals0 method, which should check the elements' equal recursively? So what would here be the worst case scenario?

Here's the snippet which was taken from the java.util.Arrays class:

public static boolean deepEquals(Object[] a1, Object[] a2) {
    if (a1 == a2)
        return true;
    if (a1 == null || a2==null)
        return false;
    int length = a1.length;
    if (a2.length != length)
        return false;

    for (int i = 0; i < length; i++) {
        Object e1 = a1[i];
        Object e2 = a2[i];

        if (e1 == e2)
            continue;
        if (e1 == null)
            return false;

        // Figure out whether the two elements are equal
        boolean eq = deepEquals0(e1, e2);

        if (!eq)
            return false;
    }
    return true;
}

static boolean deepEquals0(Object e1, Object e2) {
    assert e1 != null;
    boolean eq;
    if (e1 instanceof Object[] && e2 instanceof Object[])
        eq = deepEquals ((Object[]) e1, (Object[]) e2);
    else if (e1 instanceof byte[] && e2 instanceof byte[])
        eq = equals((byte[]) e1, (byte[]) e2);
    else if (e1 instanceof short[] && e2 instanceof short[])
        eq = equals((short[]) e1, (short[]) e2);
    else if (e1 instanceof int[] && e2 instanceof int[])
        eq = equals((int[]) e1, (int[]) e2);
    else if (e1 instanceof long[] && e2 instanceof long[])
        eq = equals((long[]) e1, (long[]) e2);
    else if (e1 instanceof char[] && e2 instanceof char[])
        eq = equals((char[]) e1, (char[]) e2);
    else if (e1 instanceof float[] && e2 instanceof float[])
        eq = equals((float[]) e1, (float[]) e2);
    else if (e1 instanceof double[] && e2 instanceof double[])
        eq = equals((double[]) e1, (double[]) e2);
    else if (e1 instanceof boolean[] && e2 instanceof boolean[])
        eq = equals((boolean[]) e1, (boolean[]) e2);
    else
        eq = e1.equals(e2);
    return eq;
}

Thanks in advance.

Upvotes: 1

Views: 688

Answers (2)

MyStackRunnethOver
MyStackRunnethOver

Reputation: 5275

I'm gonna push back a bit on the "linear in the number of total objects" idea, because it doesn't actually matter whether the input is an array of Object or an array of Object[] or whatever else. What really matters is how you define your big-O notation. Namely: if you want to relate the big-O of your algorithm to the size of your input, you have to know what "the size of your input" is. You have to define n in O(n). For example:

  1. Your input is MyObject[] of size = n where a MyObject has an equals method that takes k work, where k is a constant. What's the runtime of deepEquals()? Well, you're doing k work, n times, so you get O(n*k), throw out the constant to get O(n). Great!

  2. Next, your input is MyObject[][], where the top-level array is of size n and the nested arrays are size j, a constant. What's the runtime? n subarrays, each of length j, where each MyObject takes k work. O(n*j*k), throw out constants to get O(n) again. NOTE that our input has gotten j times larger but big-O does not change because we assume j is constant, i.e. the question we're asking is "how does the runtime vary with changes in the length of the top-level array".

  3. What if instead, we ask "how does the runtime vary with changes in the length of the top level array, and changes in the length of the nested arrays? Let our input be size n, with nested arrays size m, which is not constant. Now we get O(n*m*k), throw out the constant k to get O(n*m). If we require that our input matrix (nested array) be square, i.e. that n = m, our runtime is now O(n^2).

What?????

In #2, n is the length of the top-level array. We're choosing to ignore the fact that the length of the sub-arrays can vary.

In #3, we're internalizing that fact, and representing as m the length of the sub-arrays, to get n*m or n^2 when n = m.

We can go further:

If the equals method of a MyObject doesn't take k constant time, but instead is O(p) where p is the size of a collection contained within MyObject, then the runtime of #3 above becomes O(n*m*p) where n*m is the number of MyObject and p is the size of a MyObject's collection, and you can keep doing this forever.


The upshot is that big-O notation is a bound which is valid when you assume some things. A major part of these assumptions (which we don't think of often) is that every variable (a variable being a thing that really can change) not inside the O() parentheses doesn't matter because it's eclipsed by the things that are inside the parentheses. This means that depending on whether n is much more important than m you can say both #3 is within O(n) and #3 is within O(n*m) and be correct, according to your assumptions.

Upvotes: 2

Zabuzard
Zabuzard

Reputation: 25903

The method runs linear on the total amount of elements. If we denote that by n, it is O(n).

Sounds better than it is though, imagine you have a nested array int[][][] like:

{                     // int[][][]
    { 1, 2, 3 },      // int[]
    { 4, 5 },         // int[]
    {                 // int[][]
        { 6, 7, 8 },  // int[]
        { 9 }         // int[]
    }
}

Then we have 9 int values in total. By n I meant those 9 elements, not the 4 for the arrays of the outer structure. It runs linear on that n.

Again, I am not talking about outer.length (which is 4), I am talking about the actual amount of elements if you completely follow down the whole structure, if you flatten it. It is actually impossible to express the complexity in terms of outer.length, as it is completely unrelated. Small example to demonstrate this:

{
    {
        { 1, 2, 3, 4, ..., 1_000_000 }
    }
}

Here, input.length is just 1, but the actual amount of elements is quite huge. You see, it is unrelated.


The reason why it calls itself again is because, imagine you have an Object[][][][] (4 dimensions), then you also have to check all of those dimensions. So it really checks all elements.

Upvotes: 3

Related Questions