java Float.MAX_VALUE to Double

Question

It is this code:

public class Main {
    public static void main(String[] args) {
      float a = Float.MAX_VALUE;
      double b = (double) a;
      b++;
      System.out.println(b == a);
}

and it prints true. Could anyone explain why?

Eric Postpischil · Accepted Answer

The precision of double is not capable of representing the difference between Float.MAX_VALUE and Float.MAX_VALUE+1, so a rounded result is returned. That rounded result is Float.MAX_VALUE.

Float.MAX_VALUE is 2¹²⁸−2¹⁰⁴. (Note that this is 2¹²⁷+2¹²⁶+2¹²⁵+…+2¹⁰⁴. That is, it is the sum of all powers of two from 2¹²⁷ to 2¹⁰⁴. In binary, it has 24 one bits, which is the number of bits in the significand¹ of a float. Mathematically, it equals 2¹²⁸−2¹⁰⁴.)

When you add one to this, the mathematical result is of course 2¹²⁸−2¹⁰⁴+1. This is not representable in double, because the significand of a double is 53 bits, but from 2¹²⁷ to 1 is 129 bits. You cannot fit bits for both 2¹²⁷ and 1 inside the significand of a double. When a result is not representable, the nearest representable number is returned.

The representable number just below the mathematical result is 2¹²⁸−2¹⁰⁴, and the representable number just above the mathematical result is 2¹²⁸−2¹⁰⁴+2⁷⁵. (Note that from 2¹²⁷ to 2⁷⁵ is 52 bits, so 2⁷⁵ is the smallest power of 2 that bits in a 53-bit significand where the largest bit is being scaled to 2¹²⁷. Thus, we calculated this next number above 2¹²⁸−2¹⁰⁴ by adding the smallest amount to it that fits in the significand.) So we have two candidates:

2¹²⁸−2¹⁰⁴, which is 1 away from 2¹²⁸−2¹⁰⁴+1.
2¹²⁸−2¹⁰⁴+2⁷⁵, which is 2¹⁰⁴+2⁷⁵−1 away from 2¹²⁸−2¹⁰⁴+1.

The former is closer, so it is chosen to be the computed result. Thus, in double, adding one to 2¹²⁸−2¹⁰⁴ produces 2¹²⁸−2¹⁰⁴.

Footnote

¹ The representation of a binary floating-pont number has three parts: a sign s that is +1 or −1, a significand f that is a fixed-point number with a fixed number of bits, and an exponent e, such that the number represented is s • f • 2^e. The significand can be thought of just as an integer with a certain number of bits, but it is often scaled by adjusting the exponent so that the significand of normal floating-point numbers is in [1, 2). For example, 132 could be thought of as the significand 100001₂ times 2² or as 1.00001₂ times 2⁷.

java Float.MAX_VALUE to Double

Answers (1)

Footnote

Related Questions