Reputation: 31712
I had this argument with some people saying that C out-of-bound pointers cause undefined behavior even if they're not being dereferenced. example:
int a;
int *p = &a;
p = p - 1;
the third line here will cause undefined behavior even if p
is never dereferenced (*p
is never used).
In my opinion, it sounds illogical that C would check if a pointer is out-of-bound without the pointer being used (it's like someone would inspect people on the street to see if they're carrying guns in case they enter his house. Where the ideal thing to do is to inspect people when they're about to enter the house). I think if C checks for that then a lot of runtime overhead will occur.
Plus, if C really check for OOB pointers then why this won't cause UB:
int *p; // uninitialized thus pointing to a random adress
in this case why nothing happen even if the chance of p
pointing to an OOB adress is high.
ADD:
int a;
int *p = &a;
p = p - 1;
say &a
is 1000. Will the value of p
after evaluating the third line be:
p
could be dereferenced somewhere else and cause the real problem.because I think that "the third line was called to be of undefined behavior" in the first place was because of the potential future use of that OOB pointer (dereferencing) and people, over time, took it as an undefined behavior in it's own. Now, is the value of p
will be 100% 996 and that still undefined behavior or its value will be undefined?
Upvotes: 40
Views: 9403
Reputation: 153
But what is undefined behavior? That simply means no one is willing to say what will happen.
I'm an old mainframe dog from years back, and I like IBM's phrase for the same thing: results are unpredictable.
BTW: I like the idea of NOT checking array bounds. For example, if I have a pointer into a string, and I want to see what's just before the byte being pointed to, I can use:
pointer[-1]
to look at it.
Upvotes: 0
Reputation: 144969
C does not check if a pointer is out of bounds. But the underlying hardware might behave in strange ways when an address is computed that falls outside the object boundaries, pointing just after the end of an object being an exception. The C Standard explicitly describes this as causing undefined behavior.
For most current environments, the above code does not pose a problem, but similar situations could cause segmentation faults in x86 16-bit protected mode, some 25 years ago.
In the language of the Standard, such a value could be a trap value, something that cannot be manipulated without invoking undefined behavior.
The pertinent section of the C11 Standard is:
6.5.6 Additive operators
- When an expression that has integer type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integer expression. [...] If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. If the result points one past the last element of the array object, it shall not be used as the operand of a unary
*
operator that is evaluated.
A similar example of undefined behavior is this:
char *p;
char *q = p;
Merely loading the value of uninitialized pointer p
invokes undefined behavior, even if it is never dereferenced.
EDIT: it is a moot point trying to argue about this. The Standard says computing such an address invokes undefined behavior, so it does. The fact that some implementations might just compute some value and store it or not is irrelevant. Do not rely on any assumptions regarding undefined behavior: the compiler might take advantage of its inherently unpredictable nature to perform optimizations that you cannot imagine.
For example this loop:
for (int i = 1; i != 0; i++) {
...
}
might compile to an infinite loop without any test at all: i++
invokes undefined behavior if i
is INT_MAX
, so the compiler's analysis is this:
i
is > 0
.i < INT_MAX
, i++
is still > 0
i = INT_MAX
, i++
invokes undefined behavior, so we can assume i > 0
because we can assume anything we please.Therefore i
is always > 0
and the test code can be removed.
Upvotes: 66
Reputation: 81247
Some platforms treat pointers as integers, and process pointer arithmetic in the same fashion as integer arithmetic, but with certain values scaled up or down according to the sizes of objects. On such platforms, this will effectively define a "natural" result of all pointer arithmetic operations except subtraction of pointers whose difference is not a multiple of the size of the pointer's target type.
Other platforms may represent pointers in other ways, and addition or subtraction of certain combinations of pointers may cause unpredictable results.
The authors of the C Standard didn't want to show favoritism toward either kind of platform, so it imposes no requirements on what may happen if pointers are manipulated in ways that would cause problems on some platforms. Before the C Standard, and for quite a few years afterward, programmers could reasonably expect that general-purpose implementations for platforms which treated pointer arithmetic like scaled integer arithmetic would, themselves, treat pointer arithmetic likewise, but implementations for platforms that treated pointer arithmetic differently would likely treat it differently themselves.
In the last decade or so, however, in pursuit of "optimization", compiler writers have decided to throw the Principle of Least Astonishment out the window. Even in cases where a programmer would know what the effect of certain pointer operations would be given a platform's natural pointer representations, there's no guarantee that compilers will generate code that behaves the way the natural pointer representations would behave. The fact that the Standard says behavior is undefined is interpreted as an invitation for compilers to impose "optimizations" that force programmers to write code that is slower and clunkier than it would need to be on implementations that simply behave in a fashion consistent with the document behaviors of the underlying environment (one of the three treatments that the authors of the C89 explicitly noted as being commonplace).
Thus, unless one knows that one is using a compiler which doesn't have any wacky "optimizations" enabled, the fact that an intermediate step in a sequence of pointer computations invokes Undefined Behavior makes it impossible to reason at all about it, no matter how strongly common sense would imply that a quality implementations for a particular platform should behave a particular way.
Upvotes: 4
Reputation: 70186
The part of the question relating to undefined behavior is very clear, the answer is "Well, yes, certainly it is undefined behavior".
I will interprete the wording "Does C check..." as the following two:
(C itself is a language specification, it doesn't check, or do, anything)
The answer to the first question is: Yes, but not reliably, and not in the way you wish. Modern compilers are quite smart, sometimes smarter than you'd like. The compiler will, in some cases, be able to diagnose your illegitimate use of pointers. Since this per definition invokes undefined behavior and the language therefore no longer requires the compiler to do anything in particular, the compiler will often optimize in an unpredictable way. This may result in code that is much different from what you originally intended. Do not be surprised if an entire scope or even the complete function gets dead-stripped. This is true for many undesirable "surprise optimizations" in relation to undefined behavior.
Obligatory read: What Every C Programmer Should Know About Undefined Behavior.
The Answer to the second question is: No, except if you use a compiler which supports bounds checks and if you compile with runtime bounds checks enabled, which implies a quite non-trivial runtime overhead.
In practice, this means that if your program "survived" the compiler optimizing out undefined behavior, then it will just stubbornly do what you told it to do, with unpredictable results -- usually either garbage values being read, or your program causing a segmentation fault.
Upvotes: 3
Reputation: 52602
"Undefined behaviour" means "anything can happen". Common values of "anything" are "nothing bad happens at all" and "your code crashes". Other common values of "anything" are "bad things happen when you turn optimisation on", or "bad things happen when you don't run the code in development but a customer is running it", and still other values are "your code does something unexpected" and "your code does something that it shouldn't be able to do".
So if you say "it sounds illogical that C would check if a pointer is out-of-bound without the pointer being used", you are in very, very, very dangerous territory. Take this code:
int a = 0;
int b [2] = { 1, 2 };
int* p = &a; p - 1;
printf ("%d\n", *p);
The compiler can assume that there is no undefined behaviour. p - 1 was evaluated. The compiler concludes (legally) that either p = &a [1], p = &b [1] or p = &b [2], since in all other cases there is undefined behaviour either when evaluating p or when evaluating p-1. The compiler then assumes that *p is not undefined behaviour, so it concludes (legally) that p = &b [1] and prints the value 2. You didn't expect that, did you?
That's legal, and it happens. So the lesson is: Do NOT invoke undefined behaviour.
Upvotes: 4
Reputation: 108796
When specifications say something is undefined, that can be quite confusing.
It means, in that circumstance, the implementation of the specification is free to do whatever it wants. In some cases, it will do something that appears, intuitively, correct. In other cases it won't.
For address-boundary specs, I know my intuition comes from my assumptions about a flat uniform memory model. But there are other memory models.
The word "undefined" never appears in a completed spec unintentionally. Standards committees usually decide to use the word when they know different implementations of the standard need to do different things. In many cases the reason for the different things is performance. So: the appearance of the word in the spec is a red flag warning about that to us mere mortals, users of the spec, that our intuition may be wrong.
This kind of "whatever it wants" specification famously annoyed rms a few years back. So, he made some versions of his Gnu Compiler Collection (gcc) try to play a computer game when it encountered something undefined.
IBM used the word unpredictable in their specifications back in the 360 / 370 days. That's a better word. It makes the outcome sound more random and more dangerous. Within the scope of "unpredictable" behavior lie such problematic outcomes as "halt and catch fire."
Here's the thing, though. "Random" is a bad way to describe this kind of unpredictable behavior, because "random" implies the system may do something different each time it encounters the problem. If it does something different every time, you have a chance of catching the problem in test. In the world of "undefined" / "unpredictable" behavior the system does the same thing every time, until it doesn't. And, you know the time it doesn't will be years after you think you've finished testing your stuff.
So, when the spec says something is undefined, don't do that thing. Unless you're a friend of Murphy. OK?
Upvotes: 4
Reputation: 30597
To clarify, "Undefined Behavior" means that the outcome of the code in question is not defined in the standards governing the language. The actual outcome depends on the way in which the compiler is implemented, and can range from nothing at all to a complete crash and everything in between.
The standards do not specify that any range checking of pointers should occur. But in relation to your specific example, this is what they do say:
When an expression that has integer type is added to or subtracted from a pointer ... If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.
The above quote is from C99 §6.5.6 Para 8 (the newest version I have on hand).
Note that the above also applies to non-array pointers, since in the previous clause it says:
For the purposes of these operators, a pointer to an object that is not an element of an array behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type.
So, if you perform pointer arithmetic, and the result is either within bounds or points to one past the end of the object, then you will get a valid result, otherwise you get undefined behavior. That behaivour might be that you end up with a stray pointer, but it might be something else.
Upvotes: 15
Reputation: 477408
Indeed, the behaviour of a C program is undefined if it attempts to compute a value through pointer arithmetic that does not result to a pointer to an element, or one past the end of, the same array element. From C11 6.5.6/8:
If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.
(For the purpose of this description, the address of an object of a type T
maybe treated as the address of the first element of an array T[1]
.)
Upvotes: 22
Reputation: 100170
Yes, it is undefined behavior even if the pointer is not dereferenced.
C only allows pointers to point only one element past array bounds.
Upvotes: 7