Why is it not possible to substract 1 from max double

Question

#include 
#include 

int main()
{   
    double d = std::numeric_limits::max();
    std::cout << std::to_string(d) << std::endl;
    std::cout << std::to_string(d - 1) << std::endl;
}

[test@arch_host ~]$ g++ test.cpp 
[test@arch_host ~]$ ./a.out 
179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368.000000
179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368.000000

Why does not the second one end with the 7?

Sachiko.Shinozaki · Accepted Answer

A double in C++ is a most often binary64 format compliant IEE 754. So we are going to base this answer on that. It is also valid for other floating points format, such as float (binary32), binary16, which has no native C++ type, or even non IEEE 754 floating point formats. Your double being composed of a mantissa of 52 bits and an exponent of 11, it's range is "dynamic" :

If the exponent is big : the mantissa will represent whole digits, then, after, it won't be able to represent for exemple, a trailing 2, because of the limit number of digits on the mantissa.

If the exponent is small : the mantissa will represent smaller and smaller negative powers of 2 (1/2 , 1/4, etc...) and it will be more precise.

Onto the question itself : When not specifying one of the rounding modes defined in IEEE 754, the default is "round-to-nearest-even", which is exactly what it sounds like.

When you operate with the maximum double value, the gap between it and the last representable double is huge. So , substracting by 1.0, algebraically gives maxDouble - 1.0, but in the hardware, it is not representable, being way too small for the exponent (it would be reflected in a change after the 52th bit), so your FPU uses the round to nearest even rounding mode, and rounds to maxDouble.

To resolve your problem, you could use two solutions. Use fixed point arithmetic if the range of values computed by your program is not too big and speed of calculation isn't too much of a requirement. Or use the intrisics given by your CPU manufacturer (often found in an header file) to set the rounding mode to round toward 0 or round down.

Oh and here's a short list of the rounding modes : round to nearest even, round up, round down and round towards 0, which equates to round up or down, depending of the sign of the operation.

And, if you are going to code with values that high in floating point arithmetic, you should check periodically if your number has saturated to ∞ or -∞, because then you won't be able to apply operations on them.

Why is it not possible to substract 1 from max double

Answers (2)

Related Questions