Reputation: 201
The situation is the following:
Now on a 64bit machine, which statement is correct (if any at all):
Say that the signed binary integer 11111111001101100000101011001000 is simply negative due to an overflow. This is a practical existing problem since you might want to allocate more bytes than you can describe in a 32bit integer. But then it gets read in as a 64bit integer.
Malloc
reads this as a 64bit integer, finding 11111111001101100000101011001000################################
with # being a wildcard bit representing whatever data is stored after the original integer. In other words, it read a result close to its maximum value 2^64 and tries to allocate some quintillion bytes. It fails.Malloc
reads this as a 64bit integer, casting to 0000000000000000000000000000000011111111001101100000101011001000
, possibly because it is how it is loaded into a register leaving a lot of bits zero. It does not fail but allocates the negative memory as if reading a positive unsigned value.Malloc
reads this as a 64bit integer, casting to ################################11111111001101100000101011001000
, possibly because it is how it is loaded into a register with # a wildcard representing whatever data was previously in the register. It fails quite unpredictably depending on the last value.I actually tested this, resulting in the malloc failing (which would imply either 1 or 3 to be correct). I assume 1 is the most logical answer. I also know the fix (using size_t as input instead of int).
I'd just really want to know what actually happens. For some reason I don't find any clarification on how 32bit integers are actually treated on 64bit machines for such an unexpected 'cast'. I'm not even sure if it being in a register actually matters.
Upvotes: 19
Views: 6360
Reputation: 726669
Once an integer overflows, using its value results in undefined behavior. A program that uses the result of an int
after the overflow is invalid according to the standard -- essentially, all bets about its behavior are off.
With this in mind, let's look at what's going to happen on a computer where negative numbers are stored in two's complement representation. When you add two large 32-bit integers on such a computer, you get a negative result in case of an overflow.
However, according to C++ standard, the type of malloc
's argument, i.e. size_t
, is always unsigned. When you convert a negative number to an unsigned number, it gets sign-extended (see this answer for a discussion and a reference to the standard), meaning that the most significant bit of the original (which is 1
for all negative numbers) is set in the top 32 bits of the unsigned result.
Therefore, what you get is a modified version of your third case, except that instead of "wildcard bit #
" it has ones all the way to the top. The result is a gigantic unsigned number (roughly 16 exbibytes or so); naturally malloc
fails to allocate that much memory.
Upvotes: 13
Reputation: 158499
So if we have a specific code example, a specific compiler and platform we can probably determine what the compiler is doing. Which is the approach taken in Deep C but even then it may not be fully predictable which is a hallmark of undefined behavior, generalizing about undefined behavior is not a good idea.
We only have to take a look at the advice from the gcc
documentation to see how messy it can get. The documentation offers some good advice on integer overflow, which says:
In practice many portable C programs assume that signed integer overflow wraps around reliably using two's complement arithmetic. Yet the C standard says that program behavior is undefined on overflow, and in a few cases C programs do not work on some modern implementations because their overflows do not wrap around as their authors expected.
and in the sub-section Practical Advice for Signed Overflow Issues says:
Ideally the safest approach is to avoid signed integer overflow entirely.[...]
At the end of the day it is undefined behavior and therefore unpredictable in the general case but in the case of gcc
, in their implementation defined section on Integer says that integer overflow wraps around:
For conversion to a type of width N, the value is reduced modulo 2^N to be within range of the type; no signal is raised.
but in their advice about integer overflow they explain how optimization can cause problems with wraparound:
Compilers sometimes generate code that is incompatible with wraparound integer arithmetic.
So this quickly gets complicated.
Upvotes: 3
Reputation: 299940
The problem with your reasoning, is that it starts with the assumption that the integer overflow will result in a deterministic and predictable operation.
This, unfortunately, is not the case: undefined behavior means that anything can happen, and notably that compilers may optimize as if it could never happen.
As a result, it is nigh impossible to predict what kind of program the compiler will produce if there is such a possible overflow.
0
to size_t(-1)
and thus may allocate either too few or too much memory, or even fail to allocate, ...Undefined Behavior => All Bets Are Off
Upvotes: 18