Cratylus
Cratylus

Reputation: 54074

Casting from double to float

I was reading that d == (double)(float)d where d is a double will not evaluate to true.
It makes sense because we are casting to a type of lower precision but *I can not understand the number given as an example.
It was if d is 1e40 the expression will evaluate to +infinity
But the bit pattern of 1e40 is:

1110101100011001010011111000111000011010111001010010010111111101010111011100111110101011000010000000000000000000000000000000000000000

And the infinity is represented by the exponent all 1s and the fraction all 0s.
So how will casting reduce this specific example to infinity?

Upvotes: 1

Views: 3161

Answers (3)

T.C.
T.C.

Reputation: 137301

It's worth noting that this may be undefined behavior depending on whether float supports positive and negative infinity. N1570 §6.3.1.5:

When a value of real floating type is converted to a real floating type, if the value being converted can be represented exactly in the new type, it is unchanged. If the value being converted is in the range of values that can be represented but cannot be represented exactly, the result is either the nearest higher or nearest lower representable value, chosen in an implementation-defined manner. If the value being converted is outside the range of values that can be represented, the behavior is undefined.

§5.2.4.2.2/p5:

The minimum range of representable values for a floating type is the most negative finite floating-point number representable in that type through the most positive finite floating-point number representable in that type. In addition, if negative infinity is representable in a type, the range of that type is extended to all negative real numbers; likewise, if positive infinity is representable in a type, the range of that type is extended to all positive real numbers.

If IEEE-754 floating point is used, then 1e40 is outside the range of representable finite numbers of binary32, and the conversion yields positive infinity in default rounding mode.

Upvotes: 3

ouah
ouah

Reputation: 145829

The maximum float (IEEE-754 binary32) value is approximately 3.4028234 × 1e38. So when double value 1e40 is converted to float it yields positive infinity.

Upvotes: 3

Pascal Cuoq
Pascal Cuoq

Reputation: 80255

If you try to convert the binary sequence in your question to float, the next step after writing it in binary will be to “normalize” it so:

1.11010110001100101001111100011100001101011100101001001011111110… * 2128

And after rounding to 24 significant digits:

1.11010110001100101010000 * 2128

The number 128 is outside the acceptable range for exponents for the single-precision IEEE 754 binary representation. Exponents for normalized numbers go from -126 to +127, with a couple of exceptional values used to represent denomalized numbers (including 0), infinities and NaN.

This is why the number 1E38 ends up represented, as a float, as +inf, one of the special values encoded with one of the special exponents, and does not have the significand 1.110101100011… that you could have expected.

Upvotes: 2

Related Questions