shrike
shrike

Reputation: 4501

Safest and most efficient way to compute an integer operation that may overflow

Suppose we have 2 constants A & B and a variable i, all 64 bits integers. And we want to compute a simple common arithmetic operation such as:

i * A / B    (1)

To simplify the problem, let's assume that variable i is always in the range [INT64_MIN*B/A, INT64_MAX*B/A], so that the final result of the arithmetic operation (1) does not overflow (i.e.: fits in the range [INT64_MIN, INT64_MAX]).

In addition, i is assumed to be more likely in the friendly range Range1 = [INT64_MIN/A, INT64_MAX/A] (i.e.: close to 0), however i may be (less likely) outside this range. In the first case, a trivial integer computation of i * A would not overflow (that's why we called the range friendly); and in the latter case, a trivial integer computation of i * A would overflow, leading to an erroneous result in computation of (1).

What would be the "safest" and "most efficient" way to compute operation (1) (where "safest" means: preserving exactness or at least a decent precision, and where "most efficient" means: lowest average computation time), provided i is more likely in the friendly range Range1.

At now, the solution currently implemented in the code is the following one :

(int64_t)((double)A / B * i)

which solution is quite safe (no overflow) though inaccurate (precision loss due to double significand 53 bits limitation) and quite fast because double division (double)A / B is precomputed at compile time, letting only a double multiplication to be computed at runtime.

Upvotes: 25

Views: 1088

Answers (5)

shrike
shrike

Reputation: 4501

In order to provide a quantified answer to the question, I made a benchmark of different solutions as part of the ones proposed here in this post (thanks to comments and answers).

The benchmark measures computation time of different implementations, when i is inside the friendly range Range1 = [INT64_MIN/A, INT64_MAX/A], and when i is outside the friendly range (yet in the safe range Range2 = [INT64_MIN*B/A, INT64_MAX*B/A]).

Each implementation performs a "safe" (i.e. with no overflow) computation of the operation: i * A / B (except the 1st implementation, given as reference computation time). However, some implementations may return infrequent inaccurate computation result (which behavior is notified).

Some solutions proposed have not been tested or are not listed hereafter; these are: solution using __int128 (unsupported by ms vc compiler), yet boost int128_t has been used instead; solutions using extended 80 bits long double (unsupported by ms vc compiler); solution using InfInt (working and tested though too slow to be a decent competitor).

Time measurements are specified in ps/op (picoseconds per operation). Benchmark platform is an Intel Q6600@3GHz under Windows 7 x64, executable compiled with MS vc14, x64/Release target. The variables, constants and function referenced hereafter are defined as:

int64_t       i;
const int64_t A     = 1234567891;
const int64_t B     = 4321987;
inline bool   in_safe_range(int64_t i) { return (INT64_MIN/A <= i) && (i <= INT64_MAX/A); }
  1. (i * A / B) [reference]
    i in Range1: 1469 ps/op, i outside Range1: irrelevant (overflows)
  2. ((int64_t)((double)i * A / B))
    i in Range1: 10613 ps/op, i outside Range1: 10606 ps/op
    Note: infrequent inaccurate result (max error = 1 bit) in the whole range Range2
  3. ((int64_t)((double)A / B * i))
    i in Range1: 1073 ps/op, i outside Range1: 1071 ps/op
    Note: infrequent inaccurate result (max error = 1 bit) in the whole range Range2
    Note: compiler likely precomputes (double)A / B resulting in the observed performance boost vs previous solution.
  4. (!in_safe_range(i) ? (int64_t)((double)A / B * i) : (i * A / B))
    i in Range1: 2009 ps/op, i outside Range1: 1606 ps/op
    Note: rare inaccurate result (max error = 1 bit) outside Range1
  5. ((int64_t)((int128_t)i * A / B)) [boost int128_t]
    i in Range1: 89924 ps/op, i outside Range1: 89289 ps/op
    Note: boost int128_t performs dramatically bad on the bench platform (have no idea why)
  6. ((i / B) * A + ((i % B) * A) / B)
    i in Range1: 5876 ps/op, i outside Range1: 5879 ps/op
  7. (!in_safe_range(i) ? ((i / B) * A + ((i % B) * A) / B) : (i * A / B))
    i in Range1: 1999 ps/op, i outside Range1: 6135 ps/op

Conclusion
a) If slight computation errors are acceptable in the whole range Range2, then solution (3) is the fastest one, even faster than the direct integer computation given as reference.
b) If computation errors are unacceptable in the friendly range Range1, yet acceptable outside this range, then solution (4) is the fastest one.
c) If computation errors are unacceptable in the whole range Range2, then solution (7) performs as well as solution (4) in the friendly range Range1, and remains decently fast outside this range.

Upvotes: 4

bipll
bipll

Reputation: 11940

Not sure about value bounds, will (i / B) * A + (i % B) * A / B help?

Upvotes: 0

DarthGizka
DarthGizka

Reputation: 4675

If you cannot get better bounds on the ranges involved then you're best off following iammilind's advice to use __int128.

The reason is that otherwise you would have to implement the full logic of word to double-word multiplication and double-word by word division. The Intel and AMD processor manuals contain helpful information and ready-made code, but it gets quite involved, and using C/C++ instead of in assembler makes things doubly complicated.

All good compilers expose useful primitives as intrinsics. Microsoft's list doesn't seem to include a muldiv-like primitive but the __mul128 intrinsic gives you the two halves of the 128-bit product as two 64-bit integers. Based on that you can perform long division of two digits by one digit, where one 'digit' would be a 64-bit integer (usually called 'limb' because bigger than a digit but still only part of the whole). Still quite involved but lots better than using pure C/C++. However, portability-wise it is no better than using __int128 directly. At least that way the compiler implementers have already done all the hard work for you.

If your application domain can give you useful bounds, like that (u % d) * v will not overflow then you can use the identity

(u * v) / d = (u / d) * v + ((u % d) * v) / d

where / signifies integer division, as long as u is non-negative and d is positive (otherwise you might run afoul of the leeway allowed for the semantics of operator %).

In any case you may have to separate out the signs of the operands and use unsigned operations in order to find more useful mechanisms that you can exploit - or to circumvent sabotage by the compiler, like the saturating multiplication that you mentioned. Overflow of signed integer operations invokes undefined behaviour, compilers are free to do whatever they please. By contrast, overflow for unsigned types is well-defined.

Also, with unsigned types you can fall back on rules like that with s = a (+) b (where (+) is possibly-overflowing unsigned addition) you will have either s == a + b or s < a && s < b, which lets you detect overflow after the fact with cheap operations.

However, it is unlikely that you will get much farther on this road because the effort required quickly approaches - or even exceeds - the effort of implementing the double-limb operations I alluded to earlier. Only a thorough analysis of the application domain could give the information required for planning/deploying such shortcuts. In the general case and with the bounds you have given you're pretty much out of luck.

Upvotes: 6

iammilind
iammilind

Reputation: 69988

Some implementations permit __int128_t. Check if your implementation allows it, so that you can you may use it as placeholder instead of double. Refer below post:
Why isn't there int128_t?


If not very concerned about "fast"-ness, then for good portability I would suggest to use header only C++ library "InfInt".

It is pretty straight forward to use the library. Just create an instance of InfInt class and start using it:

InfInt myint1 = "15432154865413186646848435184100510168404641560358"; 
InfInt myint2 = 156341300544608LL;

myint1 *= --myint2 - 3;
std::cout << myint1 << std::endl;

Upvotes: 1

wilx
wilx

Reputation: 18228

I think you can detect the overflow before it happens. In your case of i * A / B, you are only worried about the i * A part because the division cannot overflow.

You can detect the overflow by performing test of bool overflow = i > INT64_MAX / A. You will have to do modify this depending on the sign of operands and result.

Upvotes: 2

Related Questions