Reputation: 109
I was playing around with some arrays and pointers in c and started wondering whether doing this would be undefined behavior.
int (*arr)[5] = malloc(sizeof(int[5][5]));
// Is this undefined behavior?
int val0 = arr[0][5];
// Rephrased, is it guaranteed it'll always have the same effect as this line?
int val1 = arr[1][0];
Thank you for any insights.
Upvotes: 0
Views: 124
Reputation: 81247
The Standard is clearly intended to avoid requiring that a compiler given something like:
int foo[5][10];
int test(int i)
{
foo[1][0] = 1;
foo[0][i] = 2;
return foo[1][0];
}
must reload the value of foo[1][0]
to accommodate the possibility that the write to foo[0][i]
might affect foo[1][0]
. On the other hand, before the Standard was written, it would have been idiomatic to write something like:
void dump_array(int *p, int rows, int cols)
{
int i,j;
for (i=0; i<rows; i++)
{
for (j=0; j<cols; j++)
printf("%6d", *p++);
printf("\n");
}
}
int foo[5][10];
...
dump_array(foo[0], 5, 10);
and nothing in the published Rationale suggests that the authors had any intention of forbidding such constructs nor breaking code that used them. Indeed, the primary benefit of requiring that rows of an array be placed consecutively, even when adding padding would improve efficiency, is to allow such code to function.
At the time the Standard was written, when generating code for a function that received a pointer, compilers would treat the pointer as though it might identify some arbitrary part of some arbitrary larger object, without making any effort to know or care about what that enclosing object might be. They would thus, as a very popular form of "conforming language extension", support constructs like dump_array
without regard for whether the Standard required them to do so, and consequently the authors of the Standard saw no reason to worry about when the Standard mandated such support. Instead, they left such matters as a Quality of Implementation issue over which the Standard could waive jurisdiction.
Unfortunately, because the authors of the Standard expected that compilers would treat the act of passing a pointer to a function as implicitly "laundering" it, the authors of the Standard saw no need to define any explicit method for laundering information about a pointer's enclosing objects in cases where it would be necessary for a function to treat a pointer identifying "raw" storage. Such distinctions didn't matter given the state of compiler technology in the 1980s, but may be quite relevant if e.g. code does something like:
int matrix[10][10];
void test2(int c)
{
matrix[4][0] = 1;
dump_array(matrix[0], 1, c);
matrix[4][0] = 2;
}
or
void test3(int r)
{
matrix[4][0] = 1;
dump_array((int*)matrix, r, 10);
matrix[4][0] = 2;
}
Depending upon what the functions is intending to do, having a compiler optimize out the first write to matrix[4][0]
in one or both may improve efficiency, or it may cause the generated code to behave uselessly. Treating explicit pointer conversions as erasing type information, but treating array-to-pointer decay as retaining it, would allow programmers to achieve required semantics if they write code as in the second example, while allowing compilers to perform the relevant optimizations when source code is written as in the first example. Unfortunately, the Standard makes no distinctions, and maintainers of free compilers are loath to forego any "optimizations" they view the Standard as giving them, leaving the language with nothing but "hope for the best" semantics except on implementations that either refrain from cross-procedural optimizations or document what needs to be done to block them.
Upvotes: 1
Reputation: 224447
In C, what you're doing is undefined behavior.
The expression arr[0]
has type int [5]
. So the expression arr[0][5]
dereferences one element past the end of the array arr[0]
, and dereferencing past the end of an array is undefined behavior.
Section 6.5.2.1p2 of the C standard regarding Array Subscripting states:
The definition of the subscript operator
[]
is thatE1[E2]
is identical to(*((E1)+(E2)))
.
And section 6.5.6p8 of the C standard regarding Additive Operators states:
When an expression that has integer type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integer expression. In other words, if the expression
P
points to the i-th element of an array object, the expressions(P)+N
(equivalently,N+(P)
) and(P)-N
(whereN
has the value n) point to, respectively, the i+n-th and i−n -th elements of the array object, provided they exist. Moreover, if the expressionP
points to the last element of an array object, the expression(P)+1
points one past the last element of the array object, and if the expressionQ
points one past the last element of an array object,the expression(Q)-1
points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. If the result points one past the last element of the array object, it shall not be used as the operand of a unary*
operator that is evaluated.
The part in bold specifies that the addition implicit in an array subscript may not result in a pointer more that one element past the end of an array, and that a pointer to one element past the end of an array may not be defererenced.
The fact that the array in question is itself a member of an array, meaning the elements of each subarray are continuous in memory, doesn't change this. Aggressive optimization settings in the compiler may note that it is undefined behavior to access past the end of the array and make optimizations based on this fact.
Upvotes: 3