Wandering Fool
Wandering Fool

Reputation: 2278

Do floats, doubles, and long doubles have a guaranteed minimum precision?

From my previous question "Is floating point precision mutable or invariant?" I received a response which said,

C provides DBL_DIG, DBL_DECIMAL_DIG, and their float and long double counterparts. DBL_DIG indicates the minimum relative decimal precision. DBL_DECIMAL_DIG can be thought of as the maximum relative decimal precision.

I looked these macros up. They are found in the header <cfloat>. From the cplusplus reference page they list macros for float, double, and long double.

Here are the macros for minimum precision values.

FLT_DIG 6 or greater

DBL_DIG 10 or greater

LDBL_DIG 10 or greater

If I took these macros at face value, I would assume that a float has a minimum decimal precision of 6, while a double and long double have a minimum decimal precision of 10. However, being a big boy, I know that some things may be too good to be true.

Therefore, I would like to know. Do floats, doubles, and long doubles have guaranteed minimum decimal precision, and is this minimum decimal precision the values of the macros given above?

If not, why?


Note: Assume we are using programming language C++.

Upvotes: 6

Views: 2524

Answers (4)

Wandering Fool
Wandering Fool

Reputation: 2278

To be more specific. Since my compiler uses the IEEE 754 Standard, then the precision of my decimal digits are guaranteed to be 6 to 9 significant decimal digits for float and 15 to 17 significant decimal digits for double. Also, since a long double on my compiler is the same size as a double, it too has 15 to 17 significant decimal digits.

These ranges can be verified from IEEE 754 single-precision binary floating-point format: binary32 and IEEE 754 double-precision binary floating-point format: binary64 respectively.

Upvotes: 0

R Sahu
R Sahu

Reputation: 206577

Do floats, doubles, and long doubles have guaranteed minimum decimal precision, and is this minimum decimal precision the values of the macros given above?

I can't find any place in the standard that guarantees any minimal values for decimal precision.

The following quote from http://en.cppreference.com/w/cpp/types/numeric_limits/digits10 might be useful:

Example

An 8-bit binary type can represent any two-digit decimal number exactly, but 3-digit decimal numbers 256..999 cannot be represented. The value of digits10 for an 8-bit type is 2 (8 * std::log10(2) is 2.41)

The standard 32-bit IEEE 754 floating-point type has a 24 bit fractional part (23 bits written, one implied), which may suggest that it can represent 7 digit decimals (24 * std::log10(2) is 7.22), but relative rounding errors are non-uniform and some floating-point values with 7 decimal digits do not survive conversion to 32-bit float and back: the smallest positive example is 8.589973e9, which becomes 8.589974e9 after the roundtrip. These rounding errors cannot exceed one bit in the representation, and digits10 is calculated as (24-1)*std::log10(2), which is 6.92. Rounding down results in the value 6.

However, the C standard specifies the minimum values that need to be supported. From the C Standard:

5.2.4.2.2 Characteristics of floating types

...

9 The values given in the following list shall be replaced by constant expressions with implementation-defined values that are greater or equal in magnitude (absolute value) to those shown, with the same sign

...

-- number of decimal digits, q, such that any floating-point number with q decimal digits can be rounded into a floating-point number with p radix b digits and back again without change to the q decimal digits,

...

FLT_DIG 6
DBL_DIG 10
LDBL_DIG 10

Upvotes: 0

Cheers and hth. - Alf
Cheers and hth. - Alf

Reputation: 145269

If std::numeric_limits<F>::is_iec559 is true, then the guarantees of the IEEE 754 standard apply to floating point type F.

Otherwise (and anyway), minimum permitted values of symbols such as DBL_DIG are specified by the C standard, which, undisputably for the library, “is incorporated into [the C++] International Standard by reference”, as quoted from C++11 §17.5.1.5/1.

Edit: As noted by TC in a comment here,

<climits> and <cfloat> are normatively incorporated by §18.3.3 [c.limits]; the minimum values are specified in turn in §5.2.4.2.2 of the C standard

Unfortunately for the formal view, first of all that quote from C++11 is from section 17.5 which is only informative, not normative. And secondly, the wording in the C standard that the values specified there are minimums, is also in a section (the C99 standard's Annex E) that's informative, not normative. So while it can be regarded as an in-practice guarantee, it's not a formal guarantee.


One strong indication that the in-practice minimum precision for float is 6 decimal digits, that no implementation will give less:

output operations default to precision 6, and this is normative text.

Disclaimer: It may be that there is additional wording that provides guarantees that I didn't notice. Not very likely, but possible.

Upvotes: 5

Jared Mulconry
Jared Mulconry

Reputation: 489

The C++ Standard says nothing specific about limits on floating point types. You may interpret the incorporation of the C Standard "by reference" as you wish, but if you take the limits as specified there (N1570), section 5.2.4.2.2 subpoint 15:

EXAMPLE 1 The following describes an artificial floating-point representation that meets the minimum requirements of this International Standard, and the appropriate values in a header for type float:
FLT_RADIX 16
FLT_MANT_DIG 6
FLT_EPSILON 9.53674316E-07F
FLT_DECIMAL_DIG 9
FLT_DIG 6
FLT_MIN_EXP -31
FLT_MIN 2.93873588E-39F
FLT_MIN_10_EXP -38
FLT_MAX_EXP +32
FLT_MAX 3.40282347E+38F
FLT_MAX_10_EXP +38

By this section, float, double and long double have these properties at the least*.

Upvotes: -1

Related Questions