Reputation: 21364
Is the behavior of the following program undefined?
#include <stdio.h>
int main(void)
{
int arr[2][3] = { { 1, 2, 3 },
{ 4, 5, 6 }
};
int *ptr1 = &arr[0][0]; // pointer to first elem of { 1, 2, 3 }
int *ptr3 = ptr1 + 2; // pointer to last elem of { 1, 2, 3 }
int *ptr3_plus_1 = ptr3 + 1; // pointer to one past last elem of { 1, 2, 3 }
int *ptr4 = &arr[1][0]; // pointer to first elem of { 4, 5, 6 }
// int *ptr_3_plus_2 = ptr3 + 2; // this is not legal
/* It is legal to compare ptr3_plus_1 and ptr4 */
if (ptr3_plus_1 == ptr4) {
puts("ptr3_plus_1 == ptr4");
/* ptr3_plus_1 is a valid address, but is it legal to dereference it? */
printf("*ptr3_plus_1 = %d\n", *ptr3_plus_1);
} else {
puts("ptr3_plus_1 != ptr4");
}
return 0;
}
According to §6.5.6 ¶8:
Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object.... If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.
From this, it would appear that the behavior of the above program is undefined; ptr3_plus_1
points to an address one past the end of the array object from which it is derived, and dereferencing this address causes undefined behavior.
Further, Annex J.2 suggests that this is undefined behavior:
An array subscript is out of range, even if an object is apparently accessible with the given subscript (as in the lvalue expression a[1][7] given the declaration int a[4][5]) (6.5.6).
There is some discussion of this issue in the Stack Overflow question, One-dimensional access to a multidimensional array: well-defined C?. The consensus here appears to be that this kind of access to arbitrary elements of a two-dimensional array through one-dimensional subscripts is indeed undefined behavior.
The issue, as I see it, is that it is not even legal to form the address of the pointer ptr3_plus_2
, so it is not legal to access arbitrary two-dimensional array elements in this way. But, it is legal to form the address of the pointer ptr3_plus_1
using this pointer arithmetic. Further, it is legal to compare the two pointers ptr3_plus_1
and ptr4
, according to §6.5.9 ¶6:
Two pointers compare equal if and only if both are null pointers, both are pointers to the same object (including a pointer to an object and a subobject at its beginning) or function, both are pointers to one past the last element of the same array object, or one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space.
So, if it both ptr3_plus_1
and ptr4
are valid pointers that compare equal and that must point to the same address (the object pointed to by ptr4
must be adjacent in memory to the object pointed to by ptr3
anyway, since array storage must be contiguous), it would seem that *ptr3_plus_1
is as valid as *ptr4
.
Is this undefined behavior, as described in §6.5.6 ¶8 and Annex J.2, or is this an exceptional case?
To Clarify
It seems unambiguous that it is undefined behavior to attempt to access the element one past the end of the final row of a two-dimensional array. My interest is in the question of whether it is legal to access the first element of the intermediate rows by forming a new pointer using a pointer to an element from the previous row and pointer arithmetic. It seems to me that a different example in Annex J.2 could have made this more clear.
Is it possible to reconcile the clear statement in §6.5.6 ¶8 that an attempted dereference of a pointer to the location one past the end of an array leads to undefined behavior with the idea that the pointer past the end of the first row of a two-dimensional array of type T[][] is also a pointer of type T * that points to an object of type T, namely the first element of an array of type T[]?
Upvotes: 5
Views: 345
Reputation: 120051
A debugging implementation might use "fat" pointers. For example, a pointer may be represented as a tuple (address, base, size) to detect out-of-bounds access. There is absolutely nothing wrong or contrary to the standard about such representation. So any pointer arithmetic that brings the pointer outside the range of [base, base+size] fails, and any dereference outside of [base, base+size) also fails.
Note that base and size are not the address and the size of the 2D array but rather of the array that the pointer points into (the row in this case).
It might sound trivial in this case, but when deciding whether a certain pointer construction is UB or not, it is useful to mentally run your example through this hypothetical implementation.
Upvotes: 4
Reputation:
So, if it both
ptr3_plus_1
andptr4
are valid pointers that compare equal and that must point to the same address
They are.
it would seem that
*ptr3_plus_1
is as valid as*ptr4
.
It is not.
The pointers are equal, but not equivalent. The trivial well-known example of the distinction between equality and equivalence is negative zero:
double a = 0.0, b = -0.0;
assert (a == b);
assert (1/a != 1/b);
Now, to be fair, there is a difference between the two, as positive and negative zero have a different representation, ptr3_plus_1
and ptr4
on typical implementations have the same representation. This is not guaranteed, and on implementations where they would have different representations, it should be clear that your code might fail.
Even on the typical implementations, while there are good arguments to be made that the same representation implies equivalent values, to the best of my knowledge, the official interpretation is that the standard does not guarantee this, therefore programs cannot rely on it, therefore implementations can assume programs do not do this and optimise accordingly.
Upvotes: 5