Jingjing Zhong
Jingjing Zhong

Reputation: 391

Traversal performance of multidimensional array in Java

In code and the results below, We can see that “Traverse2” is much faster than "Traverse1", indeed they just traverse the same number of elements.

1.How does this difference happened?

2.Putting longer interation inside shorter interation will have a better performance?

public class TraverseTest {

    public static void main(String[] args)
    {
        int a[][] = new int[100][10];
        System.out.println(System.currentTimeMillis());

        //Traverse1
        for(int i = 0; i < 100; i++)
        {
            for(int j = 0; j < 10; j++)
                a[i][j] = 1;
        }

        System.out.println(System.currentTimeMillis());

        //Traverse2
        for(int i = 0; i < 10; i++)
        {
            for(int j = 0; j < 100; j++)
                a[j][i] = 2;
        }

        System.out.println(System.currentTimeMillis());
    }
}

Result:

1347116569345

1347116569360

1347116569360

If i change it to

System.out.println(System.nanoTime());

The result will be:

4888285195629

4888285846760

4888285914219

It means that if we put longer interation inside will have a better performance. And it seems to have some conflicts with cache hits theory.

Upvotes: 3

Views: 1453

Answers (4)

chouchou
chouchou

Reputation: 11

In my point of view, size of array also affects the result. Like:

public class TraverseTest {

    public static void main(String[] args)
    {
        int a[][] = new int[10000][2];
        System.out.println(System.currentTimeMillis());

        //Traverse1
        for(int i = 0; i < 10000; i++)
        {
            for(int j = 0; j < 2; j++)
                a[i][j] = 1;
        }

        System.out.println(System.currentTimeMillis());

        //Traverse2
        for(int i = 0; i < 2; i++)
        {
            for(int j = 0; j < 10000; j++)
                a[j][i] = 2;
        }

        System.out.println(System.currentTimeMillis());
    }
}

Traverse1 needs 10000*3+1 = 30001 comparisons to decide whether to exit the iteration, however Traverse2 only needs 2*10001+1 = 20003 comparisons.

Traverse1 needs 1.5 times then number of comparisons of Traverse2.

Upvotes: 1

huseyin tugrul buyukisik
huseyin tugrul buyukisik

Reputation: 11920

My output(with you original code 100i/10j vs 10i/100j ):

1347118083906
1347118083906
1347118083906

You are using a very bad time resolution for a very quick calculation.

I changed the i and j limit to 1000 both.

    int a[][] = new int[1000][1000];
    System.out.println(System.currentTimeMillis());

    //Traverse1
    for(int i = 0; i < 1000; i++)
    {
        for(int j = 0; j < 1000; j++)
            a[i][j] = 1;
    }

    System.out.println(System.currentTimeMillis());

    //Traverse2
    for(int i = 0; i < 1000; i++)
    {
        for(int j = 0; j < 1000; j++)
            a[j][i] = 2;
    }

    System.out.println(System.currentTimeMillis());

output:

1347118210671
1347118210687 //difference is 16 ms
1347118210703 //difference is 16 ms again -_-

Two possibilities:

  • Java hotspot changes the second loop into a first-type or optimizes with exchanging i and j.
  • Time resolution is still not enough.

So i changed output as System.nanoTime()

    int a[][] = new int[1000][1000];
    System.out.println(System.nanoTime());

    //Traverse1
    for(int i = 0; i < 1000; i++)
    {
        for(int j = 0; j < 1000; j++)
            a[i][j] = 1;
    }

    System.out.println(System.nanoTime());

    //Traverse2
    for(int i = 0; i < 1000; i++)
    {
        for(int j = 0; j < 1000; j++)
            a[j][i] = 2;
    }

    System.out.println(System.nanoTime());

Output:

16151040043078
16151047859993 //difference is 7800000 nanoseconds
16151061346623 //difference is 13500000 nanoseconds --->this is half speed

1.How does this difference happened?

Note that even ommiting you just used wrong time-resolution, you are making wrong comparations vs inequal cases. First is contiguous-access while second is not.

Lets say first nested loops are just a heating-preparing for the second one then it would make your assumption of "second is much faster" even more wrong.

Dont forget that 2D-array is an "array of arrays" in java. So, the right-most index would show a contiguous area. Faster for the first version.

2.Putting longer interation inside shorter interation will have a better performance?

for(int i = 0; i < 10; i++)
    {
        for(int j = 0; j < 100; j++)
            a[j][i] = 2;
    }

Increasing the first index is slower because the next iteration goes kbytes away so you cannot use your cache-line anymore.

Absolutely not!

Upvotes: 1

Stephen C
Stephen C

Reputation: 718866

I suspect that any strangeness in the results you are seeing in this micro-benchmark are due to flaws in the benchmark itself.

For example:

  • Your benchmark does not take account of "JVM warmup" effects, such as the fact that the JIT compiler does not compile to native code immediately. (This only happens after the code has executed for a bit, and the JVM has measured some usage numbers to aid optimization.) The correct way to deal with this is to put the whole lot inside a loop that runs a few times, and discard any initial sets of times that that look "odd" ... due to warmup effects.

  • The loops in your benchmark could in theory be optimized away. The JIT compiler might be able to deduce that they don't do any work that affects the program's output.

Finally, I'd just like to remind you that hand-optimizing like this is usually a bad idea ... unless you've got convincing evidence that it is worth your while hand-optimizing AND that this code is really where the application is spending significant time.

Upvotes: 2

Alexei Kaigorodov
Alexei Kaigorodov

Reputation: 13535

First, always run microbenchmark tests several times in a loop. Then you'll see both times are 0, as the array sizes are too small. To get non-zero times, increase array sizes in 100 times. My times are roughly 32 ms for Traverse1 and 250 for Traverse2. The difference is because processor use cache memory. Access to sequential memory addresses is much faster.

Upvotes: 1

Related Questions