André Harder
André Harder

Reputation: 394

Are compilers clever enough to std::move variables going out of scope?

Consider the following piece of code:

std::vector<int> Foo() {
    std::vector<int> v = Bar();
    return v;
}

return v is O(1), since NRVO will omit the copy, constructing v directly in the storage where the function's return value would otherwise be moved or copied to. Now consider the functionally analogous code:

void Foo(std::vector<int> * to_be_filled) {
    std::vector<int> v = Bar();
    *to_be_filled = v;
}

A similar argument could be made here, as *to_be_filled = v could conceivably be compiled to an O(1) move-assign, since it's a local variable that's going out of scope (it should be easy enough for the compiler to verify that v has no external references in this case, and thus promote it to an rvalue on its last use). Is this the case? Is there a subtle reason why not?

Furthermore, it feels like this pattern can be extended to any context where an lvalue goes out of scope:

void Foo(std::vector<int> * to_be_filled) {
  if (Baz()) {
    std::vector<int> v = Bar();
    *to_be_filled = v;
  }
  ...
}

Do / can / is it useful / reasonable to expect compilers to find patterns such as the *to_be_filled = v and then automatically optimize them to assume rvalue semantics?


Edit:

g++ 7.3.0 does not perform any such optimizations in -O3 mode.

Upvotes: 13

Views: 1355

Answers (1)

Nicol Bolas
Nicol Bolas

Reputation: 473272

The compiler is not permitted to arbitrarily decide to transform an lvalue name into an rvalue to be moved from. It can only do so where the C++ standard permits it to do so. Such as in a return statement (and only when its return <identifier>;).

*to_be_filled = v; will always perform a copy. Even if it's the last statement that can access v, it is always a copy. Compilers aren't allowed to change that.

My understanding is that return v is O(1), since NRVO will (in effect) make v into an rvalue, which then makes use of std::vector's move-constructor.

That's not how it works. NRVO would eliminate the move/copy entirely. But the ability for return <identifier>; to be an rvalue is not an "optimization". It's actually a requirement that compilers treat them as rvalues.

Compilers have a choice about copy elision. Compilers don't have a choice about what return <identifier>; does. So the above will either not move at all (if NRVO happens) or will move the object.

Is there a subtle reason why not?

One reason this isn't allowed is because the location of a statement should not arbitrarily change what that statement is doing. See, return <identifier>; will always move from the identifier (if it's a local variable). It doesn't matter where it is in the function. By virtue of being a return statement, we know that if the return is executed, nothing after it will be executed.

That's not the case for arbitrary statements. The behavior of the expression *to_be_filled = v; should not change based on where it happens to be in code. You shouldn't be able to turn a move into a copy just because you add another line to the function.

Another reason is that arbitrary statements can get really complicated really quickly. return <identifier>; is very simple; it copies/moves the identifier to the return value and returns.

By contrast, what happens if you have a reference to v, and that gets used by to_be_filled somehow. Sure that can't happen in your case, but what about other, more complex cases? The last expression could conceivably read from a reference to a moved-from object.

It's a lot harder to do that in return <identifier>; cases.

Upvotes: 13

Related Questions