Reputation: 6980
I have a question which arose from another question about precision of floating numbers.
Now, I know that floating points can not always be represented accurately and hence they are stored as the closest possible floating number that can be represented.
My question is actually about the difference in representation of float
and double
.
Where does this question arise from?
Suppose I do:
System.out.println(.475d+.075d);
then the output would not be 0.55
but 0.549999
(on my machine)
However, when I do :
System.out.println(.475f+.075f);
I get the correct answer, i.e. 0.55
(a little unexpected for me)
Till now I was under an impression that double
has more precision(double will be more accurate upto a longer number of decimal places) that float
. So, if a double cannot be represented precisely, then its equivalent float representation will also be stored inaccurately.
However the results I got are a little disturbing for me. I am confused if:
precision
means?float
and double
are represented differently, apart from the fact that double has more bits?Upvotes: 3
Views: 2910
Reputation: 81115
If one regards that floating-point types actually represent ranges of values, rather than discrete values (e.g. 0.1f
doesn't represent 13421773/134217728, but rather "something between 13421772.5/134217728 and 13421773.5/134217728"), conversions from double
to float
will usually be accurate, while conversions from float
to double
will usually not. Unfortunately, Java allows the usually-inaccurate conversions to be performed implicitly, while requiring a typecast in the usually-accurate direction.
For every value of type float
, there exists a value of type double
whose range is centered about the center of the float
's range. That does not mean the double
is an accurate representation of the value in the float. For example, converting 0.1f
to double
yields a value meaning "something between 13421772.9999999/134217728 and 13421773.0000001/134217728", a value which is off by over a million times the implied tolerance.
For almost every value of type double
, there exists a value of type float
whose range completely includes the range implied by the double
. The only exceptions are values whose range is centered precisely on the boundary between two float
values. Converting such values to float
would require that the system chose one range or the other; if the system rounds up when the double
actually represented a number below the center of its range, or vice versa, the range of the float
would not totally encompass that of the double
. In practical terms, though, this is a non-issue, since it means that instead of a float
cast from a double
representing a range like (13421772.5/134217728 to 13421773.5/134217728) it would represent a range like (13421772.4999999/134217728 to 13421773.5000001/134217728). Compared with the horrendous imprecision resulting from a float
to double
cast, that tiny imprecision is nothing.
BTW, returning to the particular numbers you are using, when you do your calculations as float, the computations are:
0.075f = 20132660±½ / 268435456 0.475f = 31876710±½ / 67108864 Sum = 18454938±½ / 33554432
In other words, the sum represents a number somewhere between roughly 0.54999999701 and 0.55000002682. The most natural representation is 0.55 (since the actual value could be more or less than that, additional digits would be meaningless).
Upvotes: 1
Reputation: 86744
Precision just means more bits. A number that cannot be represented as a float
may have an exact representation as a double
, but that the number of those cases is infinitely small relative to the total number of possible cases.
For the simple cases like 0.1
, that is not representable as a fixed-length floating-point number, no matter what the number of bits available. This is the same as saying that a fraction such as 1/7 cannot be represented exactly in decimal, regardless of the number of digits you are allowed to use (as long as the number of digits is finite). You can approximate it as 0.142857142857142857... repeating over and over again, but you will never be able to write it EXACTLY no matter how long you go on.
Conversely, if a number is representable exactly as a float
, it will also be representable exactly as a double
. A double has a larger exponent range and more mantissa bits.
For your example, the cause of the apparent discrepancy is that in float
, the difference between 0.475 and its float representation was in the 'right' direction so that when truncation occurred it went how you expected it. When increasing the precision available, the representation was "closer" to 0.475 but now on the opposite side. As a gross example, let's say that the closest possible float was 0.475006 but in a double the closest possible value was 0.474999. This would give you the results you see.
Edit: Here's the results of a quick experiment:
public class Test {
public static void main(String[] args)
{
float f = 0.475f;
double d = 0.475d;
System.out.printf("%20.16f", f);
System.out.printf("%20.16f", d);
}
}
Output:
0.4749999940395355 0.4750000000000000
What this means is that the floating-point representation of the number 0.475, if you had a huge number of bits, would be just a tiny bit less than 0.475. This is see in the double representation. However, the first 'wrong' bit occurs so far to the right that when truncated to fit in a float
, it just happens to work out to 0.475. This is purely an accident.
Upvotes: 6
Reputation: 20710
A number that can be reprsented as a float
can be represented as double
too.
What you read is just formatted output, you don't read actual binary representation.
System.out.println(Long.toBinaryString(Double.doubleToRawLongBits(.475d + .075d)));
// 11111111100001100110011001100110011001100110011001100110011001
System.out.println(Integer.toBinaryString(Float.floatToRawIntBits(.475f + .075f)));
// 111111000011001100110011001101
double d = .475d + .075d;
System.out.println(d);
// 0.5499999999999999
System.out.println((float)d);
// 0.55 (as expected)
System.out.println((double)(float)d);
// 0.550000011920929
System.out.println( .475f + .075f == 0.550000011920929d);
// true
Upvotes: 9