xdavidliu
xdavidliu

Reputation: 3042

why does builtin assignment return a non-const reference instead of a const reference in C++?

(note the original question title had "instead of an rvalue" rather than "instead of a const reference". One of the answers below is in response to the old title. This was fixed for clarity)

One common construct in C and C++ is for chained assignments, e.g.

    int j, k;
    j = k = 1;

The second = is performed first, with the expression k=1 having the side effect that k is set to 1, while the value of the expression itself is 1.

However, one construct that is legal in C++ (but not in C) is the following, which is valid for all base types:

    int j, k=2;
    (j=k) = 1;

Here, the expression j=k has the side effect of setting j to 2, and the expression itself becomes a reference to j, which then sets j to 1. As I understand, this is because the expression j=k returns a non-const int&, e.g. generally speaking an lvalue.

This convention is usually also recommended for user-defined types, as explained in "Item 10: Have assignment operators return a (non-const) reference to *this" in Meyers Effective C++(parenthetical addition mine). That section of the book does not attempt to explain why the reference is a non-const one or even note the non-constness in passing.

Of course, this certainly adds functionality, but the statement (j=k) = 1; seems awkward to say the least.

If the convention were to instead have builtin assignment return const references, then custom classes would also use this convention, and the original chained construction allowed in C would still work, without any extraneous copies or moves. For example, the following runs correctly:

#include <iostream>
using std::cout;

struct X{
  int k;
  X(int k): k(k){}
  const X& operator=(const X& x){
  // the first const goes against convention
    k = x.k;
    return *this;
  }
};

int main(){
  X x(1), y(2), z(3);
  x = y = z;
  cout << x.k << '\n'; // prints 3
}

with the advantage being that all 3 (C builtins, C++ builtins, and C++ custom types) all are consistent in not allowing idioms like (j=k) = 1.

Was the addition of this idiom between C and C++ intentional? And if so, what type of situation would justify its use? In other words, what non-spurious benefit does does this expansion in functionality ever provide?

Upvotes: 3

Views: 659

Answers (3)

AnT stands with Russia
AnT stands with Russia

Reputation: 320531

By design, one fundamental difference between C and C++ is that C is an lvalue-discarding language and C++ is an lvalue-preserving language.

Before C++98, Bjarne had added references to the language in order to make operator overloading possible. And references, in order to be useful, require that the lvalueness of expressions be preserved rather than discarded.

This idea of preserving the lvalueness wasn't really formalized though until C++98. In the discussions preceding the C++98 standard the fact that references required that the lvalueness of an expression be preserved was noted and formalized and that's when C++ made one major and purposeful break from C and became an lvalue preserving language.

C++ strives to preserve the "lvalueness" of any expression result as long as it is possible. It applies to all built-in operators, and it applies to built-in assignment operator as well. Of course, it is not done to enable writing expressions like (a = b) = c, since their behavior would be undefined (at least under the original C++ standard). But because of this property of C++ you can write code like

int a, b = 42;
int *p = &(a = b);

How useful it is is a different question, but again, this is just one consequence of lvalue-preserving design of C++ expressions.

As for why it is not a const lvalue... Frankly, I don't see why it should be. As any other lvalue-preserving built-in operator in C++ it just preserves whatever type is given to it.

Upvotes: 6

M.M
M.M

Reputation: 141598

Built-in operators don't "return" anything, let alone "return a reference".

Expressions are characterized mainly by two things:

  • their type
  • their value category.

For example k + 1 has type int and value category "prvalue", but k = 1 has type int and value category "lvalue". An lvalue is an expression that designates a memory location, and the location designated by k = 1 is the same location that was allocated by the declaration int k;.

The C Standard only has value categories "lvalue" and "not lvalue". In C k = 1 has type int and category "not lvalue".


You seem to be suggesting that k = 1 should have type const int and value category lvalue. Perhaps it could, the language would be slightly different. It would outlaw confusing code but perhaps outlaw useful code too. This is a decision that's hard for a language designer or design committee to evaluate because they can't think of every possible way the language could be used.

They err on the side of not introducing restrictions that might turn out to have a problem nobody foresaw yet. A related example is Should implicitly generated assignment operators be & ref-qualified?.

One possible situation that comes to mind is:

void foo(int& x);

int y;
foo(y = 3);

which would set y to 3 and then invoke foo. This wouldn't be possible under your suggestion. Of course you could argue that y = 3; foo(y); is clearer anyway, but that's a slippery slope: perhaps increment operators shouldn't be allowed inside larger expressions etc. etc.

Upvotes: 0

kraskevich
kraskevich

Reputation: 18556

I'll answer the question in the title.

Let's assume that it returned an rvalue reference. It wouldn't be possible to return a reference to a newly assigned object this way (because it's an lvalue). If it's not possible to return a reference to a newly assigned object, one needs to create a copy. That would be terribly inefficient for heavy objects, for instance containers.

Consider an example of a class similar to std::vector.

With the current return type, the assignment works this way (I'm not using templates and copy-and-swap idiom deliberately to keep the code as simple as possible):

class vector {
     vector& operator=(const vector& other) {
         // Do some heavy internal copying here.
         // No copy here: I just effectively return this.
         return *this;
     }
};

Let's assume that it returned an rvalue:

class vector {
     vector operator=(const vector& other) {
          // Do some heavy stuff here to update this. 
          // A copy must happen here again.
          return *this;
      }
};

You might think about returning an rvalue reference, but that wouldn't work either: you can't just move *this (otherwise, a chain of assignments a = b = c would run b), so a second copy will also be required to return it.

The question in the body of your post is different: returning a const vector& is indeed possible without any of the complications shown above, so it looks more like a convention to me.

Note: the title of the question refers to built-ins, while my answer covers custom classes. I believe that it's about consistency. It would be quite surprising if it acted differently for built-in and custom types.

Upvotes: 1

Related Questions