C++ Allocating same type of variables on the heap costs tremendously different amount of time

Question

I encountered a performance issue when I am running large sets of data. To make it simple, I put down the following codes:

double *a = new double();
for (int i = 0; i < 1000000; i++){ 
    double x = 0;
    double y = 0;
    for (int j = 0; j < 1000; j++){
        x = 1000;
        y++;
    }
    *a = x; //*a = y;
}

This takes nearly 0 ms. But if I assign y to *a instead:

double *a = new double();
for (int i = 0; i < 1000000; i++){ 
    double x = 0;
    double y = 0;
    for (int j = 0; j < 1000; j++){
        x = 1000;
        y++;
    }
    *a = y; //*a = x;
}

This takes 763 ms which is significantly longer than the first case. I figured out this is caused by the relatively more complex computation of y in the loop. But I don't know why this happens. If I change

*a = y;

to

double temp=y;
*a = temp;

this still costs nearly 763 ms. It seems I just cannot assign the value of y to *a efficiently no matter how I transfer the value. Can anyone explains why y is significantly different from x after finishing the inner loop? Why even though I transferred y's value to other temp variable, it still takes long time to assign that value to *a? (Btw, there is no difference between assigning y's and x's value if 'a' is double instead of a pointer to double.)

Yakk - Adam Nevraumont · Accepted Answer

double *a = new double();
// OUTER LOOP:
for (int i = 0; i < 1000000; i++){ 
  double x = 0;
  double y = 0;
  // INNER LOOP:
  for (int j = 0; j < 1000; j++){
    x = 1000;
    y++;
  }
  *a = x; //*a = y;
}

in your OUTER LOOP, you repeatedly assign *a to be x or y.

In your INNER LOOP, you either set x to 1000 repeatedly, or you increase y 1000 times.

Now, the compiler knows that x=1000 followed by x=1000 is equivalent to doing it once. So it is really easy to optimize your code as follows:

double *a = new double();
// OUTER LOOP:
for (int i = 0; i < 1000000; i++){ 
  constexpr double x = 1000;
  double y = 0;
  // INNER LOOP:
  for (int j = 0; j < 1000; j++){
    y++;
  }
  *a = x; //*a = y;
}

then

for (int i = 0; i < 1000000; i++){ 
  double y = 0;
  // INNER LOOP:
  for (int j = 0; j < 1000; j++){
    y++;
  }
  *a = 1000; //*a = y;
}

and then

for (int i = 0; i < 1000000; i++){ 
  double y = 0;
  // INNER LOOP:
  for (int j = 0; j < 1000; j++){
    y++;
  }
  //*a = y;
}
*a = 1000;

because each of those operations is legal. Once that is done, all of the work you do with y has no side effects (as in this case we never assign it to *a), so the variable y gets eliminated:

for (int i = 0; i < 1000000; i++){ 
  // INNER LOOP:
  for (int j = 0; j < 1000; j++){
  }
}
*a = 1000;

which makes those loops empty. And empty loops can be eliminated (the compiler doesn't even have to prove they terminate!), leaving this:

*a = 1000;

On the other hand, doing y++ 1000 times on y is not generally the same as doing y += 1000 due to the possibility that y started out large enough that floating point rounding causes problems. In this case it isn't true, as no rounding will occur when adding +1 to 0. 1000 times, but it is not true in general. Because proving that there will be no rounding is hard -- and getting it perfectly right is harder -- the compiler writer probably didn't handle this case.

That leaves a much more complex bit of code to optimize, and it is hard for the compiler to determine that each iteration of the loop is exactly the same, so you have observed the optimizer failing.

How much it can optimize in this case we'd have to examine the disassembly.

C++ Allocating same type of variables on the heap costs tremendously different amount of time

Answers (2)

Related Questions