Reputation: 3
I encountered a performance issue when I am running large sets of data. To make it simple, I put down the following codes:
double *a = new double();
for (int i = 0; i < 1000000; i++){
double x = 0;
double y = 0;
for (int j = 0; j < 1000; j++){
x = 1000;
y++;
}
*a = x; //*a = y;
}
This takes nearly 0 ms. But if I assign y to *a instead:
double *a = new double();
for (int i = 0; i < 1000000; i++){
double x = 0;
double y = 0;
for (int j = 0; j < 1000; j++){
x = 1000;
y++;
}
*a = y; //*a = x;
}
This takes 763 ms which is significantly longer than the first case. I figured out this is caused by the relatively more complex computation of y in the loop. But I don't know why this happens. If I change
*a = y;
to
double temp=y;
*a = temp;
this still costs nearly 763 ms. It seems I just cannot assign the value of y to *a efficiently no matter how I transfer the value. Can anyone explains why y is significantly different from x after finishing the inner loop? Why even though I transferred y's value to other temp variable, it still takes long time to assign that value to *a? (Btw, there is no difference between assigning y's and x's value if 'a' is double instead of a pointer to double.)
Upvotes: 0
Views: 79
Reputation: 1991
In the case where you assign *a = y
the program need to end the loop to know what value to assigne to a.
In the other case, because you don't modify the value of x but always assign some constexpr, it can be taken out of the loop, and the loop is in fact never actually executed, because it doesn't have any effect on the outside world.
Upvotes: 0
Reputation: 275405
double *a = new double();
// OUTER LOOP:
for (int i = 0; i < 1000000; i++){
double x = 0;
double y = 0;
// INNER LOOP:
for (int j = 0; j < 1000; j++){
x = 1000;
y++;
}
*a = x; //*a = y;
}
in your OUTER LOOP, you repeatedly assign *a
to be x
or y
.
In your INNER LOOP, you either set x
to 1000
repeatedly, or you increase y
1000
times.
Now, the compiler knows that x=1000
followed by x=1000
is equivalent to doing it once. So it is really easy to optimize your code as follows:
double *a = new double();
// OUTER LOOP:
for (int i = 0; i < 1000000; i++){
constexpr double x = 1000;
double y = 0;
// INNER LOOP:
for (int j = 0; j < 1000; j++){
y++;
}
*a = x; //*a = y;
}
then
for (int i = 0; i < 1000000; i++){
double y = 0;
// INNER LOOP:
for (int j = 0; j < 1000; j++){
y++;
}
*a = 1000; //*a = y;
}
and then
for (int i = 0; i < 1000000; i++){
double y = 0;
// INNER LOOP:
for (int j = 0; j < 1000; j++){
y++;
}
//*a = y;
}
*a = 1000;
because each of those operations is legal. Once that is done, all of the work you do with y
has no side effects (as in this case we never assign it to *a
), so the variable y
gets eliminated:
for (int i = 0; i < 1000000; i++){
// INNER LOOP:
for (int j = 0; j < 1000; j++){
}
}
*a = 1000;
which makes those loops empty. And empty loops can be eliminated (the compiler doesn't even have to prove they terminate!), leaving this:
*a = 1000;
On the other hand, doing y++
1000
times on y
is not generally the same as doing y += 1000
due to the possibility that y
started out large enough that floating point rounding causes problems. In this case it isn't true, as no rounding will occur when adding +1 to 0.
1000 times, but it is not true in general. Because proving that there will be no rounding is hard -- and getting it perfectly right is harder -- the compiler writer probably didn't handle this case.
That leaves a much more complex bit of code to optimize, and it is hard for the compiler to determine that each iteration of the loop is exactly the same, so you have observed the optimizer failing.
How much it can optimize in this case we'd have to examine the disassembly.
Upvotes: 6