Reputation: 391
In code and the results below, We can see that “Traverse2” is much faster than "Traverse1", indeed they just traverse the same number of elements.
1.How does this difference happened?
2.Putting longer interation inside shorter interation will have a better performance?
public class TraverseTest {
public static void main(String[] args)
{
int a[][] = new int[100][10];
System.out.println(System.currentTimeMillis());
//Traverse1
for(int i = 0; i < 100; i++)
{
for(int j = 0; j < 10; j++)
a[i][j] = 1;
}
System.out.println(System.currentTimeMillis());
//Traverse2
for(int i = 0; i < 10; i++)
{
for(int j = 0; j < 100; j++)
a[j][i] = 2;
}
System.out.println(System.currentTimeMillis());
}
}
Result:
1347116569345
1347116569360
1347116569360
If i change it to
System.out.println(System.nanoTime());
The result will be:
4888285195629
4888285846760
4888285914219
It means that if we put longer interation inside will have a better performance. And it seems to have some conflicts with cache hits theory.
Upvotes: 3
Views: 1453
Reputation: 11
In my point of view, size of array also affects the result. Like:
public class TraverseTest {
public static void main(String[] args)
{
int a[][] = new int[10000][2];
System.out.println(System.currentTimeMillis());
//Traverse1
for(int i = 0; i < 10000; i++)
{
for(int j = 0; j < 2; j++)
a[i][j] = 1;
}
System.out.println(System.currentTimeMillis());
//Traverse2
for(int i = 0; i < 2; i++)
{
for(int j = 0; j < 10000; j++)
a[j][i] = 2;
}
System.out.println(System.currentTimeMillis());
}
}
Traverse1 needs 10000*3+1 = 30001 comparisons to decide whether to exit the iteration, however Traverse2 only needs 2*10001+1 = 20003 comparisons.
Traverse1 needs 1.5 times then number of comparisons of Traverse2.
Upvotes: 1
Reputation: 11920
My output(with you original code 100i/10j vs 10i/100j ):
1347118083906
1347118083906
1347118083906
You are using a very bad time resolution for a very quick calculation.
I changed the i and j limit to 1000 both.
int a[][] = new int[1000][1000];
System.out.println(System.currentTimeMillis());
//Traverse1
for(int i = 0; i < 1000; i++)
{
for(int j = 0; j < 1000; j++)
a[i][j] = 1;
}
System.out.println(System.currentTimeMillis());
//Traverse2
for(int i = 0; i < 1000; i++)
{
for(int j = 0; j < 1000; j++)
a[j][i] = 2;
}
System.out.println(System.currentTimeMillis());
output:
1347118210671
1347118210687 //difference is 16 ms
1347118210703 //difference is 16 ms again -_-
Two possibilities:
So i changed output as System.nanoTime()
int a[][] = new int[1000][1000];
System.out.println(System.nanoTime());
//Traverse1
for(int i = 0; i < 1000; i++)
{
for(int j = 0; j < 1000; j++)
a[i][j] = 1;
}
System.out.println(System.nanoTime());
//Traverse2
for(int i = 0; i < 1000; i++)
{
for(int j = 0; j < 1000; j++)
a[j][i] = 2;
}
System.out.println(System.nanoTime());
Output:
16151040043078
16151047859993 //difference is 7800000 nanoseconds
16151061346623 //difference is 13500000 nanoseconds --->this is half speed
1.How does this difference happened?
Note that even ommiting you just used wrong time-resolution, you are making wrong comparations vs inequal cases. First is contiguous-access while second is not.
Lets say first nested loops are just a heating-preparing for the second one then it would make your assumption of "second is much faster" even more wrong.
Dont forget that 2D-array is an "array of arrays" in java. So, the right-most index would show a contiguous area. Faster for the first version.
2.Putting longer interation inside shorter interation will have a better performance?
for(int i = 0; i < 10; i++)
{
for(int j = 0; j < 100; j++)
a[j][i] = 2;
}
Increasing the first index is slower because the next iteration goes kbytes away so you cannot use your cache-line anymore.
Absolutely not!
Upvotes: 1
Reputation: 718866
I suspect that any strangeness in the results you are seeing in this micro-benchmark are due to flaws in the benchmark itself.
For example:
Your benchmark does not take account of "JVM warmup" effects, such as the fact that the JIT compiler does not compile to native code immediately. (This only happens after the code has executed for a bit, and the JVM has measured some usage numbers to aid optimization.) The correct way to deal with this is to put the whole lot inside a loop that runs a few times, and discard any initial sets of times that that look "odd" ... due to warmup effects.
The loops in your benchmark could in theory be optimized away. The JIT compiler might be able to deduce that they don't do any work that affects the program's output.
Finally, I'd just like to remind you that hand-optimizing like this is usually a bad idea ... unless you've got convincing evidence that it is worth your while hand-optimizing AND that this code is really where the application is spending significant time.
Upvotes: 2
Reputation: 13535
First, always run microbenchmark tests several times in a loop. Then you'll see both times are 0, as the array sizes are too small. To get non-zero times, increase array sizes in 100 times. My times are roughly 32 ms for Traverse1 and 250 for Traverse2. The difference is because processor use cache memory. Access to sequential memory addresses is much faster.
Upvotes: 1