Reputation: 10172
Does C++ have a more accurate data type than float
or double
, or do I just have to settle for the fact that my calculations will be off?
EDIT: As Mr. Lister has pointed out, my question is regarding the precision. It's a bit frustrating when you add two floats/doubles together and the number is off half the time comparing to the calculation done by hand.
Upvotes: 0
Views: 6745
Reputation: 42032
It's a bit frustrating when you add two floats/doubles together and the number is off half the time comparing to the calculation done by hand.
The same thing applies to any other types. You can't have correct result if adding two int
s and store into an int
variable. It's even worse if adding two long long
values without an extended integer type. At least you lost only 1 bit of precision at the least significant position and still retain the accuracy when adding 2 float
s or double
s. Adding two integers will remove the most significant bits so you only get the modulo result which is not expected in many cases
That said, the best thing you can have in standard C++ is long double
if LDBL_MANT_DIG > DBL_MANT_DIG
, otherwise you have to resort to non-standard support.
For example gcc and some other compilers have various floating-point extensions:
__float128
for IEEE-754 binary128, __float80
for 80-bit extended precision,__ibm128
for double-double-arithmetic on PowerPC,_Float128
/_Float64x
,_Decimal32
/_Decimal64
/_Decimal128
On x86 (and also Motorola 6888x and Intel i960) __float80
is the best choice due to hardware support, and it's mapped to long double
by default, unless you change the -mlong-double-64/80/128
and -m96/128bit-long-double
compiler flags. On PowerPC the option is -mabi=ibmlongdouble/ieeelongdouble
. Otherwise __float128
or _Decimal128
might be the suitable type
In fact C23 also added optional support for _Decimal32/64/128
so I guess not so long after this they'll be added to C++. You can still compile a small module in C to use those new features and call from C++ as of now though. You still need to make sure that FLT128_DIG > LDBL_DIG
and aware of the differences in number base and precision when converting between them, among other things. Or if performance isn't critical then just use all decimal floats
Upvotes: 1
Reputation: 4193
In some compilers, and on some architectures, "long double" will give give you more precision than double. If you are on an x86 platform the x87 FPU has an "extended" 80-bit floating point format. Gcc and Borland compilers give you an 80 bit float value when you use the "long double" type. Note that Visual Studio does not support this (the maximum supported by MSVC is double precision, 64 bit).
There is something called a "double double" which is a software technique for implementing quad-precision 128-bit floating point. You can find libraries that implement it.
You could also investigate libraries for arbitrary precision arithmetic.
For some calculations a 64 bit integer is a better choice than a 64 bit floating point value.
But if your question is about built-in types in current C++ compilers on common platforms then the answer is that you're limited to double (64 bit floating point), and on 64 bit platforms you have 64 bit ints. If you can stick to x86 and use the right compiler you can also have long double (80-bit extended precision).
You might be interested in this question:
long double (GCC specific) and __float128
Upvotes: 5