koljan.818
koljan.818

Reputation: 69

Why is it not possible to substract 1 from max double

#include <iostream>
#include <limits>

int main()
{   
    double d = std::numeric_limits<double>::max();
    std::cout << std::to_string(d) << std::endl;
    std::cout << std::to_string(d - 1) << std::endl;
}
[test@arch_host ~]$ g++ test.cpp 
[test@arch_host ~]$ ./a.out 
179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368.000000
179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368.000000

Why does not the second one end with the 7?

Upvotes: 4

Views: 395

Answers (2)

Sachiko.Shinozaki
Sachiko.Shinozaki

Reputation: 309

A double in C++ is a most often binary64 format compliant IEE 754. So we are going to base this answer on that. It is also valid for other floating points format, such as float (binary32), binary16, which has no native C++ type, or even non IEEE 754 floating point formats. Your double being composed of a mantissa of 52 bits and an exponent of 11, it's range is "dynamic" :

If the exponent is big : the mantissa will represent whole digits, then, after, it won't be able to represent for exemple, a trailing 2, because of the limit number of digits on the mantissa.

If the exponent is small : the mantissa will represent smaller and smaller negative powers of 2 (1/2 , 1/4, etc...) and it will be more precise.

Onto the question itself : When not specifying one of the rounding modes defined in IEEE 754, the default is "round-to-nearest-even", which is exactly what it sounds like.

When you operate with the maximum double value, the gap between it and the last representable double is huge. So , substracting by 1.0, algebraically gives maxDouble - 1.0, but in the hardware, it is not representable, being way too small for the exponent (it would be reflected in a change after the 52th bit), so your FPU uses the round to nearest even rounding mode, and rounds to maxDouble.

To resolve your problem, you could use two solutions. Use fixed point arithmetic if the range of values computed by your program is not too big and speed of calculation isn't too much of a requirement. Or use the intrisics given by your CPU manufacturer (often found in an header file) to set the rounding mode to round toward 0 or round down.

Oh and here's a short list of the rounding modes : round to nearest even, round up, round down and round towards 0, which equates to round up or down, depending of the sign of the operation.

And, if you are going to code with values that high in floating point arithmetic, you should check periodically if your number has saturated to ∞ or -∞, because then you won't be able to apply operations on them.

Upvotes: 2

Crigges
Crigges

Reputation: 1243

A double is using the IEEE 754 standard for representaton. Unlike an int there is not a fixed minimal step size of 1. Instead as bigger the number gets as bigger the minimal step size gets: enter image description here

The value of your number is so big that the stepsize is much bigger than 1. To keep it simple let's assume it is 10. So if you trying to subtract 1 the value gets rounded to the next valid double. Which is the same as before.

Or with other words: 179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858367 is not a valid double.

Upvotes: 7

Related Questions