Zan Lynx
Zan Lynx

Reputation: 54355

Take the address of a one-past-the-end array element via subscript: legal by the C++ Standard or not?

I have seen it asserted several times now that the following code is not allowed by the C++ Standard:

int array[5];
int *array_begin = &array[0];
int *array_end = &array[5];

Is &array[5] legal C++ code in this context?

I would like an answer with a reference to the Standard if possible.

It would also be interesting to know if it meets the C standard. And if it isn't standard C++, why was the decision made to treat it differently from array + 5 or &array[4] + 1?

Upvotes: 89

Views: 10253

Answers (12)

Jerry Coffin
Jerry Coffin

Reputation: 490518

Preamable

Quite a few of the answers here are fairly old, and quote relatively old versions of the C++ standard (or drafts thereof). Others are based on the C standard; C99 was revised specifically to make this legal, with defined behavior, but that doesn't mean a matching change was made in C++. It looks like the text in the C++ standard has changed somewhat over time, so it may be unclear how meaningful some of the older citations are for C++ as currently defined.

Since the wording has changed over time, I'm going to cite a couple of specific drafts of the C++ standard. If later drafts revise the wording again (which wouldn't surprise me) the issue would have to be analyzed again with respect to the revised wording.

N4835

A postfix expression followed by an expression in square brackets is a postfix expression. One of the expressions shall be a glvalue of type “array of T” or a prvalue of type “pointer to T” and the other shall be a prvalue of unscoped enumeration or integral type. The result is of type “T”. The type “T” shall be a completely-defined object type.59 The expression E1[E2] is identical (by definition) to *((E1)+(E2)), except that in the case of an array operand, the result is an lvalue if that operand is an lvalue and an xvalue otherwise. The expression E1 is sequenced before the expression E2.

So, array[5] is equivalent to *(array + 5).

We then attempt to take the address of that expression using the & operator. This is defined as follows (§[expr.unary.op]/3):

The result of the unary & operator is a pointer to its operand.

  • If the operand is a qualified-id naming a non-static or variant member m of some class C with type T, the result has type “pointer to member of class C of type T” and is a prvalue designating C::m.
  • Otherwise, if the operand is an lvalue of type T, the resulting expression is a prvalue of type “pointer to T” whose result is a pointer to the designated object (6.7.1) or function. [Note: In particular, taking the address of a variable of type “cv T” yields a pointer of type “pointer to cv T”. —end note] For purposes of pointer arithmetic (7.6.6) and comparison (7.6.9, 7.6.10), an object that is not an array element whose address is taken in this way is considered to belong to an array with one element of type T.
  • Otherwise, the program is ill-formed.

The first of these three possibilities applies to class members, so it's irrelevant here.

The second applies to an lvalue. So the question is whether array + 5 is an lvalue or not. According to §[basic.lval]/1.1:

  • A glvalue is an expression whose evaluation determines the identity of an object, bit-field, or function.
    [...]
  • An xvalue is a glvalue that denotes an object whose resources can be reused (usually because it is near the end of its lifetime).
    [...]
  • An lvalue is a glvalue that is not an xvalue.

While we can form an address one past the end of an array, that address does not determine the identity of an object, bit-field or function. The relevant option would be "object", but there is no object there whose identity it can determine1. As such, when array has been defined with N elements, array + N is not an lvalue.

That leaves only the third option: the program is ill-formed.

N4944

N4944 has identical wording for §[expr.sub]/1 as N4835, so I won't quote it again here.

In N4944 the wording with respect to the * operator has changed slightly. It starts with (§[expr.unary.op]/3):

The operand of the unary & operator shall be an lvalue of some type T.

N4944 retains the same definition of an lvalue though:

  • A glvalue is an expression whose evaluation determines the identity of an object, bit-field, or function.
    [...]
  • An xvalue is a glvalue that denotes an object whose resources can be reused (usually because it is near the end of its lifetime).
    [...]
  • An lvalue is a glvalue that is not an xvalue.

As such, again, a pointer to one past the end of an array is not an lvalue, so code that attempts to apply the * operator to it is ill-formed.

Conclusion

In recent versions of the C++ standard, code like:

int array[5];
int *foo = &array[5];

...is ill formed.


1. Well, it could happen that there's some object at that address, but if so it's an accidental coincidence. Nothing on the standard requires there to be an object that address.

Upvotes: 1

David Thornley
David Thornley

Reputation: 57066

C++ standard, 5.19, paragraph 4:

An address constant expression is a pointer to an lvalue....The pointer shall be created explicitly, using the unary & operator...or using an expression of array (4.2)...type. The subscripting operator []...can be used in the creation of an address constant expression, but the value of an object shall not be accessed by the use of these operators. If the subscripting operator is used, one of its operands shall be an integral constant expression.

Looks to me like &array[5] is legal C++, being an address constant expression.

Upvotes: 0

Stack Overflow is garbage
Stack Overflow is garbage

Reputation: 248199

Your example is legal, but only because you're not actually using an out of bounds pointer.

Let's deal with out of bounds pointers first (because that's how I originally interpreted your question, before I noticed that the example uses a one-past-the-end pointer instead):

In general, you're not even allowed to create an out-of-bounds pointer. A pointer must point to an element within the array, or one past the end. Nowhere else.

The pointer is not even allowed to exist, which means you're obviously not allowed to dereference it either.

Here's what the standard has to say on the subject:

5.7:5:

When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integral expression. In other words, if the expression P points to the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N (where N has the value n) point to, respectively, the i+n-th and i−n-th elements of the array object, provided they exist. Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.

(emphasis mine)

Of course, this is for operator+. So just to be sure, here's what the standard says about array subscripting:

5.2.1:1:

The expression E1[E2] is identical (by definition) to *((E1)+(E2))

Of course, there's an obvious caveat: Your example doesn't actually show an out-of-bounds pointer. it uses a "one past the end" pointer, which is different. The pointer is allowed to exist (as the above says), but the standard, as far as I can see, says nothing about dereferencing it. The closest I can find is 3.9.2:3:

[Note: for instance, the address one past the end of an array (5.7) would be considered to point to an unrelated object of the array’s element type that might be located at that address. —end note ]

Which seems to me to imply that yes, you can legally dereference it, but the result of reading or writing to the location is unspecified.

Thanks to ilproxyil for correcting the last bit here, answering the last part of your question:

  • array + 5 doesn't actually dereference anything, it simply creates a pointer to one past the end of array.
  • &array[4] + 1 dereferences array+4 (which is perfectly safe), takes the address of that lvalue, and adds one to that address, which results in a one-past-the-end pointer (but that pointer never gets dereferenced.
  • &array[5] dereferences array+5 (which as far as I can see is legal, and results in "an unrelated object of the array’s element type", as the above said), and then takes the address of that element, which also seems legal enough.

So they don't do quite the same thing, although in this case, the end result is the same.

Upvotes: 44

Loki Astari
Loki Astari

Reputation: 264641

This is legal:

int array[5];
int *array_begin = &array[0];
int *array_end = &array[5];

Section 5.2.1 Subscripting The expression E1[E2] is identical (by definition) to *((E1)+(E2))

So by this we can say that array_end is equivalent too:

int *array_end = &(*((array) + 5)); // or &(*(array + 5))

Section 5.3.1.1 Unary operator '*': The unary * operator performs indirection: the expression to which it is applied shall be a pointer to an object type, or a pointer to a function type and the result is an lvalue referring to the object or function to which the expression points. If the type of the expression is “pointer to T,” the type of the result is “T.” [ Note: a pointer to an incomplete type (other than cv void) can be dereferenced. The lvalue thus obtained can be used in limited ways (to initialize a reference, for example); this lvalue must not be converted to an rvalue, see 4.1. — end note ]

The important part of the above:

'the result is an lvalue referring to the object or function'.

The unary operator '*' is returning a lvalue referring to the int (no de-refeference). The unary operator '&' then gets the address of the lvalue.

As long as there is no de-referencing of an out of bounds pointer then the operation is fully covered by the standard and all behavior is defined. So by my reading the above is completely legal.

The fact that a lot of the STL algorithms depend on the behavior being well defined, is a sort of hint that the standards committee has already though of this and I am sure there is a something that covers this explicitly.

The comment section below presents two arguments:

(please read: but it is long and both of us end up trollish)

Argument 1

this is illegal because of section 5.7 paragraph 5

When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integral expression. In other words, if the expression P points to the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N (where N has the value n) point to, respectively, the i + n-th and i − n-th elements of the array object, provided they exist. Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.

And though the section is relevant; it does not show undefined behavior. All the elements in the array we are talking about are either within the array or one past the end (which is well defined by the above paragraph).

Argument 2:

The second argument presented below is: * is the de-reference operator.
And though this is a common term used to describe the '*' operator; this term is deliberately avoided in the standard as the term 'de-reference' is not well defined in terms of the language and what that means to the underlying hardware.

Though accessing the memory one beyond the end of the array is definitely undefined behavior. I am not convinced the unary * operator accesses the memory (reads/writes to memory) in this context (not in a way the standard defines). In this context (as defined by the standard (see 5.3.1.1)) the unary * operator returns a lvalue referring to the object. In my understanding of the language this is not access to the underlying memory. The result of this expression is then immediately used by the unary & operator operator that returns the address of the object referred to by the lvalue referring to the object.

Many other references to Wikipedia and non canonical sources are presented. All of which I find irrelevant. C++ is defined by the standard.

Conclusion:

I am wiling to concede there are many parts of the standard that I may have not considered and may prove my above arguments wrong. NON are provided below. If you show me a standard reference that shows this is UB. I will

  1. Leave the answer.
  2. Put in all caps this is stupid and I am wrong for all to read.

This is not an argument:

Not everything in the entire world is defined by the C++ standard. Open your mind.

Upvotes: 3

JohnB
JohnB

Reputation: 13733

It should be undefined behaviour, for the following reasons:

  1. Trying to access out-of-bounds elements results in undefined behaviour. Hence the standard does not forbid an implementation throwing an exception in that case (i.e. an implementation checking bounds before an element is accessed). If & (array[size]) were defined to be begin (array) + size, an implementation throwing an exception in case of out-of-bound access would not conform to the standard anymore.

  2. It's impossible to make this yield end (array) if array is not an array but rather an arbitrary collection type.

Upvotes: 1

rlbond
rlbond

Reputation: 67839

Even if it is legal, why depart from convention? array + 5 is shorter anyway, and in my opinion, more readable.

Edit: If you want it to by symmetric you can write

int* array_begin = array; 
int* array_end = array + 5;

Upvotes: 1

Matthew Flaschen
Matthew Flaschen

Reputation: 284967

Working draft (n2798):

"The result of the unary & operator is a pointer to its operand. The operand shall be an lvalue or a qualified-id. In the first case, if the type of the expression is “T,” the type of the result is “pointer to T.”" (p. 103)

array[5] is not a qualified-id as best I can tell (the list is on p. 87); the closest would seem to be identifier, but while array is an identifier array[5] is not. It is not an lvalue because "An lvalue refers to an object or function. " (p. 76). array[5] is obviously not a function, and is not guaranteed to refer to a valid object (because array + 5 is after the last allocated array element).

Obviously, it may work in certain cases, but it's not valid C++ or safe.

Note: It is legal to add to get one past the array (p. 113):

"if the expression P [a pointer] points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow"

But it is not legal to do so using &.

Upvotes: 2

CB Bailey
CB Bailey

Reputation: 792827

I don't believe that it is illegal, but I do believe that the behaviour of &array[5] is undefined.

  • 5.2.1 [expr.sub] E1[E2] is identical (by definition) to *((E1)+(E2))

  • 5.3.1 [expr.unary.op] unary * operator ... the result is an lvalue referring to the object or function to which the expression points.

At this point you have undefined behaviour because the expression ((E1)+(E2)) didn't actually point to an object and the standard does say what the result should be unless it does.

  • 1.3.12 [defns.undefined] Undefined behaviour may also be expected when this International Standard omits the description of any explicit definition of behaviour.

As noted elsewhere, array + 5 and &array[0] + 5 are valid and well defined ways of obtaining a pointer one beyond the end of array.

Upvotes: 10

Adam Rosenfield
Adam Rosenfield

Reputation: 400562

Yes, it's legal. From the C99 draft standard:

§6.5.2.1, paragraph 2:

A postfix expression followed by an expression in square brackets [] is a subscripted designation of an element of an array object. The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2))). Because of the conversion rules that apply to the binary + operator, if E1 is an array object (equivalently, a pointer to the initial element of an array object) and E2 is an integer, E1[E2] designates the E2-th element of E1 (counting from zero).

§6.5.3.2, paragraph 3 (emphasis mine):

The unary & operator yields the address of its operand. If the operand has type ‘‘type’’, the result has type ‘‘pointer to type’’. If the operand is the result of a unary * operator, neither that operator nor the & operator is evaluated and the result is as if both were omitted, except that the constraints on the operators still apply and the result is not an lvalue. Similarly, if the operand is the result of a [] operator, neither the & operator nor the unary * that is implied by the [] is evaluated and the result is as if the & operator were removed and the [] operator were changed to a + operator. Otherwise, the result is a pointer to the object or function designated by its operand.

§6.5.6, paragraph 8:

When an expression that has integer type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integer expression. In other words, if the expression P points to the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N (where N has the value n) point to, respectively, the i+n-th and i−n-th elements of the array object, provided they exist. Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.

Note that the standard explicitly allows pointers to point one element past the end of the array, provided that they are not dereferenced. By 6.5.2.1 and 6.5.3.2, the expression &array[5] is equivalent to &*(array + 5), which is equivalent to (array+5), which points one past the end of the array. This does not result in a dereference (by 6.5.3.2), so it is legal.

Upvotes: 45

Todd Gardner
Todd Gardner

Reputation: 13521

In addition to the above answers, I'll point out operator& can be overridden for classes. So even if it was valid for PODs, it probably isn't a good idea to do for an object you know isn't valid (much like overriding operator&() in the first place).

Upvotes: 8

Richard Corden
Richard Corden

Reputation: 21721

I believe that this is legal, and it depends on the 'lvalue to rvalue' conversion taking place. The last line Core issue 232 has the following:

We agreed that the approach in the standard seems okay: p = 0; *p; is not inherently an error. An lvalue-to-rvalue conversion would give it undefined behavior

Although this is slightly different example, what it does show is that the '*' does not result in lvalue to rvalue conversion and so, given that the expression is the immediate operand of '&' which expects an lvalue then the behaviour is defined.

Upvotes: 11

Tyler McHenry
Tyler McHenry

Reputation: 76740

It is legal.

According to the gcc documentation for C++, &array[5] is legal. In both C++ and in C you may safely address the element one past the end of an array - you will get a valid pointer. So &array[5] as an expression is legal.

However, it is still undefined behavior to attempt to dereference pointers to unallocated memory, even if the pointer points to a valid address. So attempting to dereference the pointer generated by that expression is still undefined behavior (i.e. illegal) even though the pointer itself is valid.

In practice, I imagine it would usually not cause a crash, though.

Edit: By the way, this is generally how the end() iterator for STL containers is implemented (as a pointer to one-past-the-end), so that's a pretty good testament to the practice being legal.

Edit: Oh, now I see you're not really asking if holding a pointer to that address is legal, but if that exact way of obtaining the pointer is legal. I'll defer to the other answerers on that.

Upvotes: 16

Related Questions