jweaver
jweaver

Reputation: 17

C++ int64 * double == off by one

Below is the code I've tested in a 64-bit environment and 32-bit. The result is off by one precisely each time. The expected result is: 1180000000 with the actual result being 1179999999. I'm not sure exactly why and I was hoping someone could educate me:

#include <stdint.h>
#include <iostream>

using namespace std;

int main() {
  double odds = 1.18;
  int64_t st = 1000000000;
  int64_t res = st * odds;
  cout << "result: " << res << endl;
  return 1;
}

I appreciate any feedback.

Upvotes: 2

Views: 758

Answers (2)

M.M
M.M

Reputation: 141574

First of all - 1.18 is not exactly representable in double. Mathematically the result of:

double odds = 1.18;

is 1.17999999999999993782751062099 (according to an online calculator).

So, mathematically, odds * st is 1179999999.99999993782751062099.

But in C++, odds * st is an expression with type double. So your compiler has two options for implementing this:

  • Do the computation in double precision
  • Do the computation in higher precision and then round the result to double

Apparently, doing the computation in double precision in IEEE754 results in exactly 1180000000.

However, doing it in long double precision produces something more like 1179999999.99999993782751062099

Converting this to double is now implementation-defined as to whether it selects the next-highest or next-lowest value, but I believe it is typical for the next-lowest to be selected.

Then converting this next-lowest result to integer will truncate the fractional part.


There is an interesting blog post here where the author describes the behaviour of GCC:

  • It uses long double intermediate precision for x86 code (due to the x87 FPUs long double registers)
  • It uses actual types for x64 code (because the SSE/SSE2 FPU supports this more naturally)

According to the C++11 standard you should be able to inspect which intermediate precision is being used by outputting FLT_EVAL_METHOD from <cfloat>. 0 would mean actual values, 2 would mean long double is being used.

Upvotes: 2

roeland
roeland

Reputation: 5741

1.18, or 118 / 100 can't be exactly represented in binary, it will have repeating decimals. The same happens if you write 1 / 3 in decimal.

So let's go over a similar case in decimal, let's calculate (1 / 3) × 30000, which of course should be 10000:

  • odds = 1 / 3 and st = 30000

    Since computers have only a limited precision we have to truncate this number to a limited number of decimals, let's say 6, so:

  • odds = 0.333333

  • 0.333333 × 10000 = 9999.99. The cast (which in your program is implicit) will truncate this number to 9999.

There is no 100% reliable way to work around this. float and double just have only limited precision. Dealing with this is a hard problem.

Your program contains an implicit cast from double to an integer on the line int64_t res = st * odds;. Many compilers will warn you about this. It can be the source of bugs of the type you are describing. This cast, which can be explicitly written as (int64_t) some_double, rounds the number towards zero.

An alternative is rounding to the nearest integer with round(some_double);. That will—in this case—give the expected result.

Upvotes: 7

Related Questions