Ankit
Ankit

Reputation: 6980

Is it possible that a number exactly represented as float can NOT be exactly represented as double?

I have a question which arose from another question about precision of floating numbers.

Now, I know that floating points can not always be represented accurately and hence they are stored as the closest possible floating number that can be represented.

My question is actually about the difference in representation of float and double.

Where does this question arise from?

Suppose I do:

System.out.println(.475d+.075d);

then the output would not be 0.55 but 0.549999 (on my machine)

However, when I do :

System.out.println(.475f+.075f);

I get the correct answer, i.e. 0.55 (a little unexpected for me)

Till now I was under an impression that double has more precision(double will be more accurate upto a longer number of decimal places) that float. So, if a double cannot be represented precisely, then its equivalent float representation will also be stored inaccurately.

However the results I got are a little disturbing for me. I am confused if:

  1. I have an incorrect understanding of what precision means?
  2. float and double are represented differently, apart from the fact that double has more bits?

Upvotes: 3

Views: 2910

Answers (3)

supercat
supercat

Reputation: 81115

If one regards that floating-point types actually represent ranges of values, rather than discrete values (e.g. 0.1f doesn't represent 13421773/134217728, but rather "something between 13421772.5/134217728 and 13421773.5/134217728"), conversions from double to float will usually be accurate, while conversions from float to double will usually not. Unfortunately, Java allows the usually-inaccurate conversions to be performed implicitly, while requiring a typecast in the usually-accurate direction.

For every value of type float, there exists a value of type double whose range is centered about the center of the float's range. That does not mean the double is an accurate representation of the value in the float. For example, converting 0.1f to double yields a value meaning "something between 13421772.9999999/134217728 and 13421773.0000001/134217728", a value which is off by over a million times the implied tolerance.

For almost every value of type double, there exists a value of type float whose range completely includes the range implied by the double. The only exceptions are values whose range is centered precisely on the boundary between two float values. Converting such values to float would require that the system chose one range or the other; if the system rounds up when the double actually represented a number below the center of its range, or vice versa, the range of the float would not totally encompass that of the double. In practical terms, though, this is a non-issue, since it means that instead of a float cast from a double representing a range like (13421772.5/134217728 to 13421773.5/134217728) it would represent a range like (13421772.4999999/134217728 to 13421773.5000001/134217728). Compared with the horrendous imprecision resulting from a float to double cast, that tiny imprecision is nothing.

BTW, returning to the particular numbers you are using, when you do your calculations as float, the computations are:

0.075f = 20132660±½ / 268435456
0.475f = 31876710±½ /  67108864
Sum    = 18454938±½ /  33554432

In other words, the sum represents a number somewhere between roughly 0.54999999701 and 0.55000002682. The most natural representation is 0.55 (since the actual value could be more or less than that, additional digits would be meaningless).

Upvotes: 1

Jim Garrison
Jim Garrison

Reputation: 86744

Precision just means more bits. A number that cannot be represented as a float may have an exact representation as a double, but that the number of those cases is infinitely small relative to the total number of possible cases.

For the simple cases like 0.1, that is not representable as a fixed-length floating-point number, no matter what the number of bits available. This is the same as saying that a fraction such as 1/7 cannot be represented exactly in decimal, regardless of the number of digits you are allowed to use (as long as the number of digits is finite). You can approximate it as 0.142857142857142857... repeating over and over again, but you will never be able to write it EXACTLY no matter how long you go on.

Conversely, if a number is representable exactly as a float, it will also be representable exactly as a double. A double has a larger exponent range and more mantissa bits.

For your example, the cause of the apparent discrepancy is that in float, the difference between 0.475 and its float representation was in the 'right' direction so that when truncation occurred it went how you expected it. When increasing the precision available, the representation was "closer" to 0.475 but now on the opposite side. As a gross example, let's say that the closest possible float was 0.475006 but in a double the closest possible value was 0.474999. This would give you the results you see.

Edit: Here's the results of a quick experiment:

public class Test {

    public static void main(String[] args)
    {
        float  f = 0.475f;
        double d = 0.475d;

        System.out.printf("%20.16f", f);
        System.out.printf("%20.16f", d);
    }
}

Output:

  0.4749999940395355  0.4750000000000000

What this means is that the floating-point representation of the number 0.475, if you had a huge number of bits, would be just a tiny bit less than 0.475. This is see in the double representation. However, the first 'wrong' bit occurs so far to the right that when truncated to fit in a float, it just happens to work out to 0.475. This is purely an accident.

Upvotes: 6

Piotr Findeisen
Piotr Findeisen

Reputation: 20710

A number that can be reprsented as a float can be represented as double too.

What you read is just formatted output, you don't read actual binary representation.

System.out.println(Long.toBinaryString(Double.doubleToRawLongBits(.475d + .075d)));
// 11111111100001100110011001100110011001100110011001100110011001
System.out.println(Integer.toBinaryString(Float.floatToRawIntBits(.475f + .075f)));
// 111111000011001100110011001101

double d = .475d + .075d;
System.out.println(d);
// 0.5499999999999999
System.out.println((float)d);
// 0.55 (as expected)
System.out.println((double)(float)d);
// 0.550000011920929

System.out.println( .475f + .075f == 0.550000011920929d);
// true

Upvotes: 9

Related Questions