C/C++ compiler optimisations: should I prefer creating new variables, re-using existing ones, or avoiding variables altogether?

Question

This is something I've always wondered: is it easier for the compiler to optimise functions where existing variables are re-used, where new (ideally const) intermediate variables are created, or where creating variables is avoided in favour of directly using expressions?

For example, consider the functions below:

// 1. Use expression as and when needed, no new variables
void MyFunction1(int a, int b)
{
    SubFunction1(a + b);
    SubFunction2(a + b);
    SubFunction3(a + b);
}

// 2. Re-use existing function parameter variable to compute
// result once, and use result multiple times.
// (I've seen this approach most in old-school C code)
void MyFunction2(int a, int b)
{
    a += b;
    
    SubFunction1(a);
    SubFunction2(a);
    SubFunction3(a);
}

// 3. Use a new variable to compute result once,
// and use result multiple times.
void MyFunction3(int a, int b)
{
    int sum = a + b;
    
    SubFunction1(sum);
    SubFunction2(sum);
    SubFunction3(sum);
}

// 4. Use a new const variable to compute result once,
// and use result multiple times.
void MyFunction4(int a, int b)
{
    const int sum = a + b;
    
    SubFunction1(sum);
    SubFunction2(sum);
    SubFunction3(sum);
}

My intuition is that:

In this particular situation, function 4 is easiest to optimise because it explicitly states the intention for the use of the data. It is telling the compiler: "We are summing the two input arguments, the result of which will not be modified, and we are passing on the result in an identical way to each subsequent function call." I expect that the value of the sum variable will just be put into a register, and no actual underlying memory access will occur.
Function 1 is the next easiest to optimise, though it requires more inference on the part of the compiler. The compiler must spot that a + b is used in an identical way for each function call, and it must know that the result of a + b is identical each time that expression is used. I would still expect the result of a + b to be put into a register rather than committed to memory. However, if the input arguments were more complicated than plain ints, I can see this being more difficult to optimise (rules on temporaries would apply for C++).
Function 3 is the next easiest after that: the result is not put into a const variable, but the compiler can see that sum is not modified anywhere in the function (assuming that the subsequent functions do not take a mutable reference to it), so it can just store the value in a register similarly to before. This is less likely than in function 4's case, though.
Function 4 gives the least assistance for optimisations, since it directly modifies an incoming function argument. I'm not 100% sure what the compiler would do here: I don't think it's unreasonable to expect it to be intelligent enough to spot that a is not used anywhere else in the function (similarly to sum in function 3), but I wouldn't guarantee it. This could require modifying stack memory depending on how the function arguments are passed in (I'm not too familiar with the ins and outs of how function calls work at that level of detail).

Are my assumptions here correct? Are there more factors to take into account?

EDIT: A couple of clarifications in response to comments:

If C and C++ compilers would approach the above examples in different ways, I'd be interested to know why. I can understand that C++ would optimise things differently depending on what constraints there are on whichever objects might be inputs to these functions, but for primitive types like int I would expect them to use identical heuristics.
Yes, I could compile with optimisations and look at the assembly output, but I don't know assembly, hence I'm asking here instead.

Eric Postpischil · Accepted Answer

Good modern compilers generally do not “care” about the names you use to store values. They perform lifetime analyses of the values and generate code based on that. For example, given:

int x = complicated expression 0;
... code using x
x = complicated expression 1;
... code using x

the compiler will see that complicated expression 0 is used in the first section of code and complicated expression 1 is used in the second section of code, and the name x is irrelevant. The result will be the same as if the code used different names:

int x0 = complicated expression 0;
... code using x0
int x1 = complicated expression 1;
... code using x1

So there is no point in reusing a variable for a different purpose; it will not help the compiler save memory or otherwise optimize.

Even if the code were in a loop, such as:

int x;
while (some condition)
{
    x = complicated expression;
    ... code using x
}

the compiler will see that complicated expression is born at the beginning of the loop body and ends by the end of the loop body.

What this means is you do not have to worry about what the compiler will do with the code. Instead, your decisions should be guided mostly by what is clearer to write and more likely to avoid bugs:

Avoid reusing a variable for more than one purpose. For example, if somebody is later updating your function to add a new feature, they might miss the fact you have changed the function parameter with a += b; and use a later in the code as if it still contained the original parameter.
Do freely create new variables to hold repeated expressions. int sum = a + b; is fine; it expresses the intent and makes it clearer to readers when the same expression is used in multiple places.
Limit the scope of variables (and identifiers generally). Declare them only in the innermost scope where they are needed, such as inside a loop rather than outside. The avoids a variable being used accidentally where it is no longer appropriate.

C/C++ compiler optimisations: should I prefer creating new variables, re-using existing ones, or avoiding variables altogether?

Answers (1)

Related Questions