Reputation: 5073

How to catch undefined behaviour without executing it?

In my software I am using the input values from the user at run time and performing some mathematical operations. Consider for simplicity below example:

int multiply(const int a, const int b)
{
    if(a >= INT_MAX || B >= INT_MAX)
        return 0;
    else
        return a*b;
}

I can check if the input values are greater than the limits, but how do I check if the result will be out of limits? It is quite possible that a = INT_MAX - 1 and b = 2. Since the inputs are perfectly valid, it will execute the undefined code which makes my program meaningless. This means any code executed after this will be random and eventually may result in crash. So how do I protect my program in such cases?

Upvotes: 3

Answers (3)

Patrick Schlüter

Reputation: 11841

Specifically for the multiplication of a by b the mathematically correct way to detect if it will overflow is to calculate log₂ of both values. If their sum is higher than the log₂ of the highest representable value of the result, then there is overflow.

log₂(a) + log₂(b) < log₂(UINT_MAX)

The difficulty is to calculate quickly the log₂ of an integer. For that, there are several bit twiddling hacks that can be used, like counting bit, counting leading zeros (some processors even have instructions for that). This site has several implementations https://graphics.stanford.edu/~seander/bithacks.html#IntegerLogObvious

The simplest implementation could be:

unsigned int log2(unsigned int v)
{
  unsigned int r = 0;

  while (v >>= 1) 
    r++;
  return r;
}

In your program you only need to check then

 if(log2(a) + log2(b) < MYLOG2UINTMAX)
   return a*b;
 else
   printf("Overflow");

The signed case is similar but has to take care of the negative case specifically.

EDIT: My solution is not complete and has an error which makes the test more severe than necessary. The equation works in reality if the log₂ function returns a floating point value. In the implementation I limited thevalue to unsigned integers. This means that completely valid multiplication get refused. Why? Because log2(UINT_MAX) is truncated log₂(UINT_MAX)=log₂(4294967295)≈31.9999999997 truncated to 31.

We have there for to change the implementation to replace the constant to compare to

#define MYLOG2UINTMAX (CHAR_BIT*sizeof (unsigned int))

Upvotes: 3

haccks

Reputation: 106022

You may try this:

if ( b > ULONG_MAX / a )        // Need to check a  != 0 before this division  
   return 0;                    //a*b invoke UB
else
   return a*b;

Upvotes: 2

Mats Petersson

Reputation: 129374

This really comes down to what you actually want to do in this case.

For a machine where long or long long (or int64_t) is a 64-bit value, and int is a 32-bit value, you could do (I'm assuming long is 64 bit here):

long x = static_cast<long>(a) * b;
if (x > MAX_INT || x < MIN_INT)
   return 0;
else
   return static_cast<int>(x);

By casting one value to long, the other will have to be converted as well. You can cast both if that makes you happier. The overhead here, above a normal 32-bit multiply is a couple of clock-cycles on modern CPU's, and it's unlikely that you can find a safer solution, that is also faster. [You can, in some compilers, add attributes to the if saying that it's unlikely to encourage branch prediction "to get it right" for the common case of returning x]

Obviously, this won't work for values where the type is as big as the biggest integer you can deal with (although you could possibly use floating point, but it may still be a bit dodgy, since the precision of float is not sufficient - could be done using some "safety margin" tho' [e.g. compare to less than LONG_INT_MAX / 2], if you don't need the entire range of integers.). Penalty here is a bit worse tho', especially transitions between float and integer isn't "pleasant".

Another alternative is to actually test the relevant code, with "known invalid values", and as long as the rest of the code is "ok" with it. Make sure you test this with the relevant compiler settings, as changing the compiler options will change the behaviour. Note that your code then has to deal with "what do we do when 65536 * 100000 is a negative number", and your code didn't expect so. Perhaps add something like:

 int x = a * b;
 if (x < 0) return 0;

[But this only works if you don't expect negative results, of course]

You could also inspect the assembly code generated and understand the architecture of the actual processor [the key here is to understand if "overflow will trap" - which it won't by default in x86, ARM, 68K, 29K. I think MIPS has an option of "trap on overflow"], and determine whether it's likely to cause a problem [1], and add something like

#if (defined(__X86__) || defined(__ARM__))
 #error This code needs inspecting for correct behaviour 
#endif
    return a * b;

One problem with this approach, however, is that even the slightest changes in code, or compiler version may alter the outcome, so it's important to couple this with the testing approach above (and make sure you test the ACTUAL production code, not some hacked up mini-example).

[1] The "undefined behaviour" is undefined to allow C to "work" on processors that have trapping overflows of integer math, as well as the fact that that a * b when it overflows in a signed value is of course hard to determine unless you have a defined math system (two's complement, one's complement, distinct sign bit) - so to avoid "defining" the exact behaviour in these cases, the C standard says "It's undefined". It doesn't mean that it will definitely go bad.

Upvotes: 5

How to catch undefined behaviour without executing it?

Answers (3)

Related Questions