Géry Ogam
Géry Ogam

Reputation: 8027

Clarifying the value categories of expressions

In 2010, Bjarne Stroustrup, the creator of C++, wrote the paper “New” Value Terminology in which he explains the value categories of expressions introduced in the C++11 standard* (lvalue, xvalue, and prvalue, and their generalizations glvalue and rvalue):

There were only two independent properties:

  • “has identity” – i.e. and address, a pointer, the user can determine whether two copies are identical, etc.
  • “can be moved from” – i.e. we are allowed to leave to source of a “copy” in some indeterminate, but valid state

This led me to the conclusion that there are exactly three kinds of values (using the regex notational trick of using a capital letter to indicate a negative – I was in a hurry):

  • iM: has identity and cannot be moved from
  • im: has identity and can be moved from (e.g. the result of casting an lvalue to a rvalue reference)
  • Im: does not have identity and can be moved from

The fourth possibility (“IM”: doesn’t have identity and cannot be moved) is not useful in C++ (or, I think) in any other language. In addition to these three fundamental classifications of values, we have two obvious generalizations that correspond to the two independent properties:

  • i: has identity
  • m: can be moved from

In 2015, Richard Smith, then the C++ standard editor, wrote the paper Guaranteed copy elision through simplified value categories in which he explains the rewording of the value categories of expressions introduced in the C++17 standard**:

However, these rules are hard to internalize and confusing -- for instance, an expression that creates a temporary object designates an object, so why is it not an lvalue? Why is NonMoveable().arr an xvalue rather than a prvalue? This paper suggests a rewording of these rules to clarify their intent. In particular, we suggest the following definitions for glvalue and prvalue:

  • A glvalue is an expression whose evaluation computes the location of an object, bit-field, or function.
  • A prvalue is an expression whose evaluation initializes an object, bit-field, or operand of an operator, as specified by the context in which it appears.

That is: prvalues perform initialization, glvalues produce locations.

Denotationally, we have:

  • glvalue :: Environment -> (Environment, Location)
  • prvalue :: (Environment, Location) -> Environment

So far, this is not a functional change to C++; it does not change the classification of any existing expression. However, it makes it simpler to reason about why expressions are classified as they are:

struct X { int n; };
extern X x;
X{4};   // prvalue: represents initialization of an X object
x.n;    // glvalue: represents the location of x's member n
X{4}.n; // glvalue: represents the location of X{4}'s member n;
        //          in particular, xvalue, as member is expiring

Basically, Smith only reworded Stroustrup’s definition of a prvalue from ‘does not have identity’ to ‘performs initialization’.

I am still unclear about the following things (so these are my questions):

  1. The meaning of Smith’s notations ‘glvalue :: Environment -> (Environment, Location)’ and ‘prvalue :: (Environment, Location) -> Environment’.
  2. The rationale for which Smith’s expression X{4}.n is not a prvalue under the C++17 standard**, since it performs initialization of the complete object X{4} (called ‘temporary object materialization’) and in particular of its subobject n.
  3. The rationale for which Smith’s expression X{4}.n is not a prvalue under the C++11 standard*, since it represents a subobject of a temporary object.

Notes

* The value categories of expressions in the C++11 standard, [basic.lval/1] (bold emphasis mine):

  • An lvalue (so called, historically, because lvalues could appear on the left-hand side of an assignment expression) designates a function or an object. [ Example: If E is an expression of pointer type, then *E is an lvalue expression referring to the object or function to which E points. As another example, the result of calling a function whose return type is an lvalue reference is an lvalue. — end example ]
  • An xvalue (an “eXpiring” value) also refers to an object, usually near the end of its lifetime (so that its resources may be moved, for example). An xvalue is the result of certain kinds of expressions involving rvalue references ([dcl.ref]). [ Example: The result of calling a function whose return type is an rvalue reference is an xvalue. — end example ]
  • A glvalue (“generalized” lvalue) is an lvalue or an xvalue.
  • An rvalue (so called, historically, because rvalues could appear on the right-hand side of an assignment expression) is an xvalue, a temporary object ([class.temporary]) or subobject thereof, or a value that is not associated with an object.
  • A prvalue (“pure” rvalue) is an rvalue that is not an xvalue. [ Example: The result of calling a function whose return type is not a reference is a prvalue. The value of a literal such as 12, 7.3e5, or true is also a prvalue. — end example ]

** The value categories of expressions in the C++17 standard, [basic.lval/1] (bold emphasis mine):

  • A glvalue is an expression whose evaluation determines the identity of an object, bit-field, or function.
  • A prvalue is an expression whose evaluation initializes an object or a bit-field, or computes the value of the operand of an operator, as specified by the context in which it appears.
  • An xvalue is a glvalue that denotes an object or bit-field whose resources can be reused (usually because it is near the end of its lifetime). [ Example: Certain kinds of expressions involving rvalue references yield xvalues, such as a call to a function whose return type is an rvalue reference or a cast to an rvalue reference type.  — end example ]
  • An lvalue is a glvalue that is not an xvalue.
  • An rvalue is a prvalue or an xvalue.

Upvotes: 3

Views: 745

Answers (1)

Davis Herring
Davis Herring

Reputation: 39798

  1. This has been largely answered in the comments, but to elaborate: the semantics of any imperative system can be expressed without side effects by considering the state of “the world” (starting with all of RAM) as an argument to a function and as (part of) its return value. This notation indicates that evaluating a glvalue selects an address (the identity of an object) from that environment (and possibly alters it) whereas evaluating a prvalue requires such a location and alters the environment to contain an initialized object there (possibly with other side effects).
  2. X{4}.n doesn’t initialize n (with what, itself?); it allows access to (i.e., identifies) the value established by just X{4} (which is materialized so as to have a particular n to identify).
  3. You’re right about its temporary status, but that just makes it an rvalue; a prvalue is an rvalue that is not also an xvalue.

Upvotes: 3

Related Questions