B.K.
B.K.

Reputation: 10172

A more accurate data type than float or double?

Does C++ have a more accurate data type than float or double, or do I just have to settle for the fact that my calculations will be off?

EDIT: As Mr. Lister has pointed out, my question is regarding the precision. It's a bit frustrating when you add two floats/doubles together and the number is off half the time comparing to the calculation done by hand.

Upvotes: 0

Views: 6745

Answers (2)

phuclv
phuclv

Reputation: 42032

It's a bit frustrating when you add two floats/doubles together and the number is off half the time comparing to the calculation done by hand.

The same thing applies to any other types. You can't have correct result if adding two ints and store into an int variable. It's even worse if adding two long long values without an extended integer type. At least you lost only 1 bit of precision at the least significant position and still retain the accuracy when adding 2 floats or doubles. Adding two integers will remove the most significant bits so you only get the modulo result which is not expected in many cases

That said, the best thing you can have in standard C++ is long double if LDBL_MANT_DIG > DBL_MANT_DIG, otherwise you have to resort to non-standard support.
For example gcc and some other compilers have various floating-point extensions:

On x86 (and also Motorola 6888x and Intel i960) __float80 is the best choice due to hardware support, and it's mapped to long double by default, unless you change the -mlong-double-64/80/128 and -m96/128bit-long-double compiler flags. On PowerPC the option is -mabi=ibmlongdouble/ieeelongdouble. Otherwise __float128 or _Decimal128 might be the suitable type

In fact C23 also added optional support for _Decimal32/64/128 so I guess not so long after this they'll be added to C++. You can still compile a small module in C to use those new features and call from C++ as of now though. You still need to make sure that FLT128_DIG > LDBL_DIG and aware of the differences in number base and precision when converting between them, among other things. Or if performance isn't critical then just use all decimal floats

Upvotes: 1

Ross Bencina
Ross Bencina

Reputation: 4193

In some compilers, and on some architectures, "long double" will give give you more precision than double. If you are on an x86 platform the x87 FPU has an "extended" 80-bit floating point format. Gcc and Borland compilers give you an 80 bit float value when you use the "long double" type. Note that Visual Studio does not support this (the maximum supported by MSVC is double precision, 64 bit).

There is something called a "double double" which is a software technique for implementing quad-precision 128-bit floating point. You can find libraries that implement it.

You could also investigate libraries for arbitrary precision arithmetic.

For some calculations a 64 bit integer is a better choice than a 64 bit floating point value.

But if your question is about built-in types in current C++ compilers on common platforms then the answer is that you're limited to double (64 bit floating point), and on 64 bit platforms you have 64 bit ints. If you can stick to x86 and use the right compiler you can also have long double (80-bit extended precision).

You might be interested in this question:

long double (GCC specific) and __float128

Upvotes: 5

Related Questions