Reputation: 61
I am trying to understand pointer comparison operators in c programs.
ISO/IEC 9899:2011 specifies that comparing pointers (using >
or <
) pointing to different objects is undefined behavior.
However, playing around I figured that when "irrelevant" pointers are compared, they seem to be treated as just "numbers that happen to represent a location in memory", by all tested compilers/interpreters.
Is this always the case? If so, why isn't this part of the standard?
To put this differently, can there be an edge case where pointer p
points to virtual memory address of let's say 0xffff
, pointer b
to 0x0000
, yet (p < b)
returns true?
Upvotes: 4
Views: 407
Reputation: 47952
Is this always the case?
Most of the time, and on popular architectures with "flat" memory spaces. (Or at least, this used to be the case. As a comment reminds me, this is yet another example of the sort of thing that used to be undefined-but-you-could-probably-get-away-with-it, but is migrating towards undefined-and-don't-touch-it-with-a-ten-foot-pole.)
If so, why isn't this part of the standard?
Because it's absolutely not true all of the time, and C has never been interested in limiting itself to one set of architectures in that sort of way.
In particular, "segmented" memory architectures were once very, very popular (think MS-DOS), and depending on the memory model you used, heterogeneous pointer comparisons definitely didn't work.
Upvotes: 0
Reputation: 171127
Note that "undefined behaviour" does not mean "will crash" or "will do bad stuff." It means "there is no definition of what will happen; literally anything is allowed to happen." And when optimisations get into the picture, literally anything can actually happen, too.
Regarding your observation: you've probably tested this on x86 or x86_64 architecture. On those, it's still likely that you will get the behaviour you've observed (even though it's technically undefined). However, keep in mind that the C specification is intended to work on all platforms and architectures where C can be used, including exotic embedded platforms, specialised hardware etc. On such platforms, I'd be much less certain of the results of such pointer comparisons.
Upvotes: 3
Reputation: 123468
Is this always the case?
No. There's no guarantee that separate objects will be laid out in any particular order. There's no guarantee that all objects occupy the same memory segment.
If so, why isn't this part of the standard?
See above.
"Undefined behavior" means exactly this:
3.4.3
1 undefined behavior
behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements
2 NOTE Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).
3 EXAMPLE An example of undefined behavior is the behavior on integer overflow
In plain English, neither the compiler nor the runtime environment are required to handle the situation in any particular way, and the result could quite literally be anything. Your code could crash immediately. You could enter a bad state such that your program crashes elsewhere (those issues are fun to debug, let me tell you). You could corrupt other data. Or your code could appear to run just fine and have no obvious bad effects, which is the worst possible outcome.
Upvotes: 0
Reputation: 213842
Is this always the case? If so, why isn't this part of the standard?
Most of the time, but not necessarily. There's various oddball architectures with segmented memory areas. The C standard also wants to allow pointers to be some abstract items, that are not necessarily equivalent to physical addresses.
Also, in theory if you have something like this
int a;
int b;
int* pa = &a;
int* pb = &b;
if (pa < pb) // undefined behavior
puts("less");
else
puts("more");
Then the compiler could in theory replace the whole if-else with puts("more")
, even if the address of pa
is lower than the address of pb
. Because it is free to deduct that pa
and pb
cannot be compared, or that comparing them always gives false. This is the danger of undefined behavior - what code the compiler generates is anyone's guess.
In practice, the undefined behavior in the above snippet seems to lead to less efficient code, on -O3 with gcc and clang x86. It compiles into two loads of the addresses and then a run-time comparison. Even though the compiler should be able to calculate all addresses at compile time.
When changing the code to well-defined behavior:
int a[2];
int* pa = &a[0];
int* pb = &a[1];
Then I get much better machine code - the comparison is now calculated at compile time and the whole program is replaced by a simple call to puts("less")
.
On embedded systems compilers however, you are almost certainly able to access any address as if it was an integer - as a well-defined non-standard extension. Otherwise it would be impossible to code things like flash drivers, bootloaders, CRC memory checks etc.
Upvotes: 3