Reputation: 69
#include <iostream>
#include <limits>
int main()
{
double d = std::numeric_limits<double>::max();
std::cout << std::to_string(d) << std::endl;
std::cout << std::to_string(d - 1) << std::endl;
}
[test@arch_host ~]$ g++ test.cpp [test@arch_host ~]$ ./a.out 179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368.000000 179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368.000000
Why does not the second one end with the 7?
Upvotes: 4
Views: 395
Reputation: 309
A double in C++ is a most often binary64 format compliant IEE 754. So we are going to base this answer on that. It is also valid for other floating points format, such as float (binary32), binary16, which has no native C++ type, or even non IEEE 754 floating point formats. Your double being composed of a mantissa of 52 bits and an exponent of 11, it's range is "dynamic" :
If the exponent is big : the mantissa will represent whole digits, then, after, it won't be able to represent for exemple, a trailing 2, because of the limit number of digits on the mantissa.
If the exponent is small : the mantissa will represent smaller and smaller negative powers of 2 (1/2 , 1/4, etc...) and it will be more precise.
Onto the question itself : When not specifying one of the rounding modes defined in IEEE 754, the default is "round-to-nearest-even", which is exactly what it sounds like.
When you operate with the maximum double value, the gap between it and the last representable double is huge. So , substracting by 1.0, algebraically gives maxDouble - 1.0, but in the hardware, it is not representable, being way too small for the exponent (it would be reflected in a change after the 52th bit), so your FPU uses the round to nearest even rounding mode, and rounds to maxDouble.
To resolve your problem, you could use two solutions. Use fixed point arithmetic if the range of values computed by your program is not too big and speed of calculation isn't too much of a requirement. Or use the intrisics given by your CPU manufacturer (often found in an header file) to set the rounding mode to round toward 0 or round down.
Oh and here's a short list of the rounding modes : round to nearest even, round up, round down and round towards 0, which equates to round up or down, depending of the sign of the operation.
And, if you are going to code with values that high in floating point arithmetic, you should check periodically if your number has saturated to ∞ or -∞, because then you won't be able to apply operations on them.
Upvotes: 2
Reputation: 1243
A double
is using the IEEE 754 standard for representaton. Unlike an int
there is not a fixed minimal step size of 1. Instead as bigger the number gets as bigger the minimal step size gets:
The value of your number is so big that the stepsize is much bigger than 1. To keep it simple let's assume it is 10. So if you trying to subtract 1 the value gets rounded to the next valid double. Which is the same as before.
Or with other words: 17976931348623157081452742373170435679807056752584499659891747680315726078002853876058955863276687817154045895351438246423432132688946418276846754670353751698604991057655128207624549009038932894407586850845513394230458323690322294816580855933212334827479782620414472316873817718091929988125040402618412485836
7 is not a valid double.
Upvotes: 7