Reputation: 23
I'm writing an algorithm that deals with Logit-normal distributions transformations which have the characteristic that, with high enough sigmas/means, the values can be "compressed" towards the edges at 0 and 1 very tightly. I'm encountering the problem where values that should be different collide because the C++ double can't represent sufficient digits. For example, I'm applying the sigmoid/logistic function to two high enough but different values, the outcome is identical when it shouldn't be if doubles could hold higher precision.
I can deal with that by aggregating samples that collide onto the same value. But the issue is that around 1, the collision happens, because numbers are represented like this: 0.999999999953 for example. But around 0 it doesn't happen, because C++ starts to represent values in scientific notation, like so: 0.12949384e-300. The latter can achieve much higher precision therefore no collision happens here. But I would need symmetrical behavior for both the values around 0 and 1 to make sure the result is computed the same way on both sides.
Is there such a type in C++ that does not automatically switch to scientific notations when representing extremely small values? So I could achieve the same behavior around 0 that I get around 1?
Upvotes: -1
Views: 208
Reputation: 42002
Is there such a type in C++ that does not automatically switch to scientific notations when representing extremely small values?
There's no such thing as "automatically switch to scientific notations when representing extremely small values". The output format is specified by the programmer. In C++ the output format can be controlled by input/output manipulators, std::format
, or printf
format specifiers. For example you can use std::fixed << std::setprecision(800)
to tell std::cout
to not use scientific notation for very small or very large values and print 800 digits of precision
That means saying "C++ starts to represent values in scientific notation" is wrong. It's just that you're not using the correct format, for example %e
or %g
in printf
. However changing the output format won't help because the real problem you're having is the lack of precision and/or range
C++ floating-point types always have a fixed format and fixed precision. They're stored internally in memory in the form (-1)sign × M × baseN which can be seen as similar to scientific notation when base = 10. Due to the nature of the formula, the numbers that can be represented are not uniformly spaced; the difference between two consecutive representable numbers varies with the chosen scale. Specifically for IEEE-754, due to the implied 1
most significant bit, values around 1 are spaced the closest, although denormals will have fixed spacing between values. However if you go into to denormal zone you'll have serious performance issue and also lose precision gradually at the other end of the range
One way to probably fix this if you choose to stick with the same double
type is to rearrange the formula to use techniques and functions that are designed for cases like this:
std::log
and std::exp
. You'll need to rewrite the function into variants for values around 1 and far from that and use the correct function to minimize errors. Also consider std::fma for multiply-add/subtract operations. There are also other functions for improving precision in some specific usecases like std::cbrt, std::hypot, std::exp2 although I'm not sure if they help you or notTo get higher precision and range (because you need values very close to 0) you have to use a custom type. The most commonly used formats use the same form above, but with a larger significand and/or exponent
boost::multiprecision::mpfr_float
, boost::multiprecision::gmp_float
, boost::multiprecision::cpp_bin_float
, boost::multiprecision::cpp_dec_float
...There are some other floating-point formats like
but as of now they're not hardware-accelerated, so for your usecase fixed-precision or double-double arithmetic is the best choice
Upvotes: 0
Reputation: 4474
You can use the type long double
. It has typically on x86 80 bits precision which is the original precision of the 8087 coprocessor.
Note that Microsoft dropped support for it long ago.
GCC supports the type __float128
that implements IEEE754 quadruple precision. This is nice since if you use Linux its probably already there.
Boost multiprecision offers a 128 bit cross platform facade for the above. However you would typically bring the boost library dependency into your application, which can be daunting in certain scenarios. UPDATE: boost multiprecision's float128 does not compile on MSVC/Windows.
Example:
#include <iostream>
#include <limits>
#include <cmath>
#include "boost/format.hpp"
#include <boost/multiprecision/float128.hpp>
namespace mp = boost::multiprecision;
template< typename T >
[[gnu::noinline]] T calc(T a,T b,T c) {
return T(1) - (T(1)/a + T(1)/b + T(1)/c );
}
int main() {
std::cout << " Epsilon Error"<< std::endl;
boost::format fmt( "%12.6g %12.6g" );
std::cout << fmt % std::numeric_limits<double>::epsilon() % calc<double>(3,2,6) << std::endl;
std::cout << fmt % std::numeric_limits<long double>::epsilon() % calc<long double>(3,2,6) << std::endl;
std::cout << fmt % double(std::numeric_limits<__float128>::epsilon()) % double(calc<__float128>(3,2,6)) << std::endl;
std::cout << fmt % std::numeric_limits<mp::float128>::epsilon() % calc<mp::float128>(3,2,6) << std::endl;
}
Results in
Program stdout
Epsilon Error
2.22045e-16 1.11022e-16
1.0842e-19 0
0 9.62965e-35
1.92593e-34 9.62965e-35
However note there is a huge performance penalty for the 128 bit precision. I wrote a benchmark using the above formula and the results are:
-------------------------------------------------------------
Benchmark Time CPU Iterations
-------------------------------------------------------------
Calc<double> 375575 ns 375514 ns 1864
Calc<long double> 531313 ns 531228 ns 1318
Calc<__float128> 12943995 ns 12941473 ns 54
Calc<mp::float128> 13074108 ns 13071816 ns 54
Upvotes: 1