Reputation: 2334
Now, some of you will be tempted to shout undefined behaviour, but there's a problem. The type int64_t
is NOT defined by the C standard but by POSIX. POSIX defines this type as:
a signed integer type with width N, no padding bits, and a two's-complement representation.
It does not leave this for the implementation to define and most definitely does NOT allow it to be treated as an unbounded integer.
linux$ cat x.c
#include <stdio.h>
#include <stdlib.h>
#include <inttypes.h>
int stupid (int64_t a) {
return (a+1) > a;
}
int main(void)
{
int v;
printf("%d\n", v = stupid(INT64_MAX));
exit(v);
}
linux$ gcc -ox x.c -Wall && ./x
0
linux$ gcc -ox x.c -Wall -O2 && ./x # THIS IS THE ERROR.
1
linux$ gcc --version
gcc (Debian 4.9.2-10) 4.9.2
Copyright (C) 2014 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
linux$ uname -a
Linux localhost 3.14.13-0-amd64 #1 SMP Sat Jul 26 20:03:23 BST 2014 x86_64 GNU/Linux
linux$ getconf LONG_BIT
32
linux$
Obviously, there's a problem here ... what is it?
Have I missed some sort of implicit cast ?
Upvotes: 4
Views: 1234
Reputation: 881093
You don't need to go to POSIX to sort this out, ISO C controls this particular aspect in toto (references below are to the C11
standard).
The rest of this answer is going to go all "language lawyer" to show why it's undefined behaviour to add one to your signed value and therefore why both answers (true and false) are valid.
First, your contention that int64_t
is not defined in ISO is not really correct. Section 7.20.1.1 Exact-width integer types
states, when referring to intN_t
types, that:
The typedef name
intN_t
designates a signed integer type with width N, no padding bits, and a two’s complement representation. Thus,int8_t
denotes such a signed integer type with a width of exactly 8 bits.These types are optional. However, if an implementation provides integer types with widths of 8, 16, 32, or 64 bits, no padding bits, and (for the signed types) that have a two’s complement representation, it shall define the corresponding typedef names.
That's why you don't need to worry about POSIX defining those types in a certain way, since ISO defines them exactly the same (two's complement, no padding, etc) assuming it has the appropriate capabilities.
So, now that we've established ISO does define them (if they're available in the implementation), let's now look at 6.5 Expressions /5
:
If an exceptional condition occurs during the evaluation of an expression (that is, if the result is not mathematically defined or not in the range of representable values for its type), the behavior is undefined.
Adding two identical integral types will definitely give you the same type (at least at the rank of int64_t
, well above the point where integer promotions are done1), since this is dictated by the usual arithmetic conversions specified in 6.3.1.8
. After the section dealing with various floating point types (of which int64_t
is not), we see:
If both operands have the same type, then no further conversion is needed.
Earlier in that same section you find the statement that dictates the type of the result once a common type has been found:
Unless explicitly stated otherwise, the common real type is also the corresponding real type of the result.
So, given that the result of INT64_MAX+1
won't actually fit into a int64_t
variable, the behaviour is undefined.
Based on your comments that the encoding of int64_t
indicates that adding one would wrap, you have to understand that doesn't change the clause that it's undefined at all. An implementation is still free to do whatever it wants in this case, even if that doesn't make sense based on what you think.
And, in any case, the expression INT64_MAX + 1 > INT64_MAX
(where 1
undergoes the integer promotion to an int64_t
) may well be simply compiled as 1
since that's arguably faster to do than actually incrementing a value and doing a comparison. And it is the correct result, since anything is the correct result :-)
In that sense, it's no different to an implementation converting:
int ub (int i) { return i++ * i++; } // big no-no
:
int x = ub (3);
into the simpler and almost certainly faster:
int x = 0;
You may contend that the answer would be better as 9
or 12
(depending on when the ++
side effects are performed) but, given undefined behaviour is a breaking of the contract between coder and compiler, the compiler is free to do whatever it wants.
In any case, if you want a well defined version of your function, you could simply opt for something like:
int stupid (int64_t a) {
return (a == INT64_MAX) ? 0 : 1;
}
That gets you the desired/expected result without resorting to undefined behaviour :-)
1 There may be an edge case here if the width of int
is actually greater than 64 bits. In that case, it may well be that the integer promotions will force int64_t
to an int
, allowing the expression to be well defined. I haven't looked into that in great detail so may be wrong (in other words, don't consider it a gospel part of my answer) but it's worth keeping in mind as something to check if you ever get an implementation with an int
more than 64 bits wide.
Upvotes: 5
Reputation: 97638
The specification of 2's complement representation is important because C provides both bit-wise and arithmetic operations on the same type, and thus it is important to define the relationship between a variable's bit-wise representation and its arithmetic interpretation.
For example, setting the Most Significant Bit of a 2's complement integer to 1 (bit-wise operation) is equivalent to subtracting INT_MAX+1
from its value (arithmetic operation).
These operations are not defined in terms of each other, though, they are defined according to mathematical logic. There is no such thing as "2's complement addition", because "2's complement" is a bit representation concept, and "addition" is an arithmetic one - the terms are from different domains.
As such, defining these relationships does not automatically define the result of individual operations which, within their own domain, result in values outside of the range of the type. For instance, a bit shift which requires more bits than have been defined in the representation is undefined, whatever its relationship to arithmetic value would be if it were possible. Within the domain of bit representation, the operation is simply impossible.
By the same token, an addition resulting in an arithmetic value larger than the maximum permitted by the type is undefined, regardless of what bit-wise operations would result if it were attempted. Within the domain of arithmetic, the result is well-defined, but impossible to store.
It is possible for a standard to define a behaviour for operations which are logically "impossible", for instance division by zero. In the case of overflow, a standard could define that this result in a "wraparound", affecting the most significant bits of the representation. Equally, it could say that this operation should result in a runtime error of some sort, or define it as resulting in the maximum possible value.
The C standard instead leaves this as undefined behaviour, meaning any behaviour a compiler chooses is equally valid, even if that varies at different optimisation levels or different parts of a program. In your case, there is no value of a
for which asserting that a+1 > a
is incorrect, so the compiler is free to assume that the inequality is true for all possible values, and simplify it at compile time.
Upvotes: 0
Reputation: 169523
C11 6.5 §5 applies:
If an exceptional condition occurs during the evaluation of an expression (that is, if the result is not mathematically defined or not in the range of representable values for its type), the behavior is undefined.
It is not the definition of the type that is the problem (even without POSIX, the exact-width integer types are already specified to use two's-complement by the C standard alone), but the operation 'signed addition' that results in an exceptional condition.
If you don't want that, use unsigned arithmetics instead. Note that if you want to avoid implementation-defined behaviour, the final conversion back to signed would need to be performed via type-punning instead of casting.
Personally I think that's overkill and would be fine writing the comparison as
(int64_t)((uint64_t)a + 1) > a
instead of going through the hassle of using unions or pointer casts.
Upvotes: 3
Reputation: 7320
I'm still going to shout UNDEFINED BEHAVIOR.
The reasoning here is simple, the compiler assumes that you're a perfect programer and would never write any code that could possibly result in undefined behavior.
So when it sees your function :
int stupid (int64_t a) {
return (a+1) > a;
}
It assumes that you're never going to call it with a==INT64_MAX
, because that would be UB.
Therefore, this function can trivially be optimized to :
int stupid (int64_t a) {
return 1;
}
Which can then be inlined as appropriate.
I suggest you read What Every C Programmer Should Know About Undefined Behavior for more explanations of how compilers exploit UB for optimizations purposes.
Upvotes: 8
Reputation: 60843
POSIX defines the width and representation of int64_t
, but the behavior of arithmetic operators such as +
is defined by the C standard. The C standard is clear that the behavior of arithmetic operators on signed values is undefined if the result would overflow.
Upvotes: 3