Reputation: 223699
Section 6.5.9 of the C standard regarding the ==
and !=
operators states the following:
2 One of the following shall hold:
- both operands have arithmetic type;
- both operands are pointers to qualified or unqualified versions of compatible types;
- one operand is a pointer to an object type and the other is a pointer to a qualified or unqualified version of void; or
- one operand is a pointer and the other is a null pointer constant.
...
6 Two pointers compare equal if and only if both are null pointers, both are pointers to the same object (including a pointer to an object and a subobject at its beginning) or function, both are pointers to one past the last element of the same array object, or one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space.109)
7 For the purposes of these operators, a pointer to an object that is not an element of an array behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type.
Footnote 109:
109) Two objects may be adjacent in memory because they are adjacent elements of a larger array or adjacent members of a structure with no padding between them, or because the implementation chose to place them so, even though they are unrelated. If prior invalid pointer operations (such as accesses outside array bounds) produced undefined behavior, subsequent comparisons also produce undefined behavior.
This would seem to indicate you could do the following:
int a;
int b;
printf("a precedes b: %d\n", (&a + 1) == &b);
printf("b precedes a: %d\n", (&b + 1) == &a);
This should be legal since we are using an address one element past the end of an array (which in this case is a single object treated as an array of size 1) without dereferencing it. More importantly, one of these two statements would be required to output 1
if one variable immediately followed the other in memory.
However, testing didn't seem to pan this out. Given the following test program:
#include <stdio.h>
struct s {
int a;
int b;
};
int main()
{
int a;
int b;
int *x = &a;
int *y = &b;
printf("sizeof(int)=%zu\n", sizeof(int));
printf("&a=%p\n", (void *)&a);
printf("&b=%p\n", (void *)&b);
printf("x=%p\n", (void *)x);
printf("y=%p\n", (void *)y);
printf("addr: a precedes b: %d\n", ((&a)+1) == &b);
printf("addr: b precedes a: %d\n", &a == ((&b)+1));
printf("pntr: a precedes b: %d\n", (x+1) == y);
printf("pntr: b precedes a: %d\n", x == (y+1));
printf(" x=%p, &a=%p\n", (void *)(x), (void *)(&a));
printf("y+1=%p, &b+1=%p\n", (void *)(y+1), (void *)(&b+1));
struct s s1;
x=&s1.a;
y=&s1.b;
printf("addr: s.a precedes s.b: %d\n", ((&s1.a)+1) == &s1.b);
printf("pntr: s.a precedes s.b: %d\n", (x+1) == y);
return 0;
}
Compiler is gcc 4.8.5, system is CentOS 7.2 x64.
With -O0
, I get the following output:
sizeof(int)=4
&a=0x7ffe9498183c
&b=0x7ffe94981838
x=0x7ffe9498183c
y=0x7ffe94981838
addr: a precedes b: 0
addr: b precedes a: 0
pntr: a precedes b: 0
pntr: b precedes a: 1
x=0x7ffe9498183c, &a=0x7ffe9498183c
y+1=0x7ffe9498183c, &b+1=0x7ffe9498183c
addr: s.a precedes s.b: 1
We can see here that an int
is 4 bytes and that the address of a
is 4 bytes past the address of b
, and that x
holds the address of a
while y
holds the address of b
. However the comparison &a == ((&b)+1)
evaluates to false while the comparison (x+1) == y
evaluates to true. I would expect both to be true as the addresses being compared appear identical.
With -O1
, I get this:
sizeof(int)=4
&a=0x7ffca96e30ec
&b=0x7ffca96e30e8
x=0x7ffca96e30ec
y=0x7ffca96e30e8
addr: a precedes b: 0
addr: b precedes a: 0
pntr: a precedes b: 0
pntr: b precedes a: 0
x=0x7ffca96e30ec, &a=0x7ffca96e30ec
y+1=0x7ffca96e30ec, &b+1=0x7ffca96e30ec
addr: s.a precedes s.b: 1
pntr: s.a precedes s.b: 1
Now both comparisons evaluate to false even though (as before) the address being compared appear to be the same.
This seems to point to undefined behavior, but based on how I read the above passage it seems this should be allowed.
Note also that the comparison of the addresses of adjacent objects of the same type in a struct
prints the expected result in all cases.
Am I misreading something here regarding what is allowed (meaning this is UB), or is this version of gcc non-conforming in this case?
Upvotes: 44
Views: 3107
Reputation: 263207
Can an equality comparison of unrelated pointers evaluate to true?
Yes, but ...
int a;
int b;
printf("a precedes b: %d\n", (&a + 1) == &b);
printf("b precedes a: %d\n", (&b + 1) == &a);
There are, by my interpretation of the C standard, three possibilities:
I played around with this some time ago and concluded that GCC was performing an invalid optimization on the ==
operator for pointers, making it yield false even when the addresses are the same, so I submitted a bug report:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63611
That bug was closed as a duplicate of another report:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61502
The GCC maintainers who responded to these bug reports seem to be of the opinion that adjacency of two objects need not be consistent and that the comparison of their addresses might show them to be adjacent or not, within the same run of the program. As you can see from my comments on the second Bugzilla ticket, I strongly disagree. In my opinion, without consistent behavior of the ==
operator, the standard's requirements for adjacent objects is meaningless, and I think we have to assume that those words are not merely decorative.
Here's a simple test program:
#include <stdio.h>
int main(void) {
int x;
int y;
printf("&x = %p\n&y = %p\n", (void*)&x, (void*)&y);
if (&y == &x + 1) {
puts("y immediately follows x");
}
else if (&x == &y + 1) {
puts("x immediately follows y");
}
else {
puts("x and y are not adjacent");
}
}
When I compile it with GCC 6.2.0, the printed addresses of x
and y
differ by exactly 4 bytes at all optimization levels, but I get y immediately follows x
only at -O0
; at -O1
, -O2
, and -O3
I get x and y are not adjacent
. I believe this is incorrect behavior, but apparently, it's not going to be fixed.
clang 3.8.1, in my opinion, behaves correctly, showing x immediately follows y
at all optimization levels. Clang previously had a problem with this; I reported it:
https://bugs.llvm.org/show_bug.cgi?id=21327
and it was corrected.
I suggest not relying on comparisons of addresses of possibly adjacent objects behaving consistently.
(Note that relational operators (<
, <=
, >
, >=
) on pointers to unrelated objects have undefined behavior, but equality operators (==
, !=
) are generally required to behave consistently.)
Upvotes: 29
Reputation: 81105
The authors of the Standard weren't trying to make it "language-lawyer-proof", and as a consequence, it is somewhat ambiguous. Such ambiguity will not generally be a problem when compiler writers make a bona fide effort to uphold the Principle of Least Astonishment, since there is a clear non-astonishing behavior, and any other behavior would have astonishing consequences. On the other hand, it does mean those compiler writers who are more interested in whether optimizations can be justified under any reading of the Standard than in whether they will be compatible with existing code can find interesting opportunities to justify incompatibility.
The Standard doesn't require that pointers' representations bear any relationship to the underlying physical architecture. It would be perfectly legitimate for a system to represent each pointer as a combination of a handle and an offset. A system which represented pointers in such fashion would be free to move the objects represented thereby around in physical storage as it saw fit. On such a system, the first byte of object #57 might follow immediately after the last byte of object #23 at one moment in time, but might be at some completely unrelated location at some other moment. I see nothing in the Standard that would prohibit such an implementation from reporting a "just past" pointer for object #23 as equal to a pointer to object #57 when the two objects happened to be adjacent, and as unequal when they happened not to be.
Further, under the as-if rule, an implementation that would be justified in moving objects around in such fashion and having a quirky equality operator, as a result, would be allowed to have a quirky equality operator whether or not it physically moved objects around in storage.
If, however, an implementation specifies how pointers are stored in RAM, and such definition would be inconsistent with the behavior described above, however, that would compel the implementation to implement the equality operator in a fashion consistent with that specification. Any compiler that wants to have a quirky equality operator must refrain from specifying a pointer-storage format that would be inconsistent with such behavior.
Further, the Standard would seem to imply that if code observes that if two pointers with defined values have identical representation, they must compare equal. Reading an object using a character type and then writing that same sequence of character-type values into another object should yield an object equivalent to the original; such equivalence is a fundamental feature of the language. If p
is a pointer "just past" one object, and q
is a pointer to another object, and their representations are copied to p2
and q2
, respectively, then p1
must compare equal to p
and q2
to q
. If the decomposed character-type representations of p
and q
are equal, that would imply that q2
was written with the same sequence of character-type values as p1
, which would, in turn, imply that all four pointers must be equal.
Consequently, while it would be allowable for a compiler to have quirky equality semantics for pointers which are never exposed to code that might observe their byte-level representation, such behavioral license would not extend to pointers which are thus exposed. If an implementation defines a directive or setting that invites compilers to have individual comparisons arbitrarily report equal or unequal when given pointers to the end of one object and the start of another whose placement would only be observable via such comparison, the implementation wouldn't have to worry about conformance in cases where pointer representations are observed. Otherwise, though, even in if there are cases where conforming implementations would be allowed to have quirky comparison semantics, that doesn't mean any quality implementations should do so unless invited unless a pointer just past the end of one object would naturally have a different representation from a pointer to the start of the next.
Upvotes: 5
Reputation: 234645
int a;
int b;
printf("a precedes b: %d\n", (&a + 1) == &b);
printf("b precedes a: %d\n", (&b + 1) == &a);
is perfectly well-defined code, but probably more by luck than by judgement.
You are allowed to take the address of a scalar and set a pointer one past that address. So &a + 1
is valid, but &a + 2
is not. You are also allowed to compare the value of a pointer of the same type with the value of any other valid pointer using ==
and !=
, although pointer arithmetic is only valid within arrays.
Your assertion that the address of a
and b
tells you about anything about how these are placed in memory is bunk. To be clear, you cannot "reach" b
by pointer arithmetic on the address of a
.
As for
struct s {
int a;
int b;
};
The standard guarantees that the address of the struct
is the same as the address of a
, but an arbitrary amount of padding is allowed to be inserted between a
and b
. Again, you can't reach the address of b
by any pointer arithmetic on the address of a
.
Upvotes: 15
Reputation: 153338
Can an equality comparison of unrelated pointers evaluate to true?
Yes. C specifies when this is true.
Two pointers compare equal if and only if ... or one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space. C11dr §6.5.9 6
To be clear: adjacent variables in code do not need to be adjacent in memory, yet can be.
The below code demonstrates that it is possible. It uses a memory dump of a int*
in addition to the conventional "%p"
and (void*)
.
Yet OP's code and output not reflect this. Given the "compare equal if and only if" part of the above spec, IMO, OP's compilation is non-compliant. Adjacent in memory variables p,q
, of the same type, either &p+1 == &q
or &p == &q+1
must be true.
No opinion if the objects differ in type - OP does not ask that IAC.
void print_int_ptr(const char *prefix, int *p) {
printf("%s %p", prefix, (void *) p);
union {
int *ip;
unsigned char uc[sizeof (int*)];
} u = {p};
for (size_t i=0; i< sizeof u; i++) {
printf(" %02X", u.uc[i]);
}
printf("\n");
}
int main(void) {
int b = rand();
int a = rand();
printf("sizeof(int) = %zu\n", sizeof a);
print_int_ptr("&a =", &a);
print_int_ptr("&a + 1 =", &a + 1);
print_int_ptr("&b =", &b);
print_int_ptr("&b + 1 =", &b + 1);
printf("&a + 1 == &b: %d\n", &a + 1 == &b);
printf("&a == &b + 1: %d\n", &a == &b + 1);
return a + b;
}
Output
sizeof(int) = 4
&a = 0x28cc28 28 CC 28 00
&a + 1 = 0x28cc2c 2C CC 28 00 <-- same bit pattern
&b = 0x28cc2c 2C CC 28 00 <-- same bit pattern
&b + 1 = 0x28cc30 30 CC 28 00
&a + 1 == &b: 1 <-- compare equal
&a == &b + 1: 0
Upvotes: 8