Reputation: 89
I am aware of the property of binary floating points where computers will not be able to calculate them to their rounded figures. I was wondering if there was any "logic" to knowing which floats will be rounded and which will not?
For example, when I run 0.1 + 0.2 in my console it returns 0.30000000000000004. Yet when I run 0.1 + 0.3 it correctly returns 0.4.
Is there any logic that determines which particular floats will not be rounded 'correctly'?
Upvotes: 4
Views: 309
Reputation: 224310
A finite number can be represented in the common IEEE-754 double-precision format if and only if it equals M•2e for some integers M and e such that -253 < M < 253 and -1074 ≤ e ≤ 971.
Every other finite number converted from decimal or resulting from another operation will be rounded.
(This is the format JavaScript uses because it conforms to ECMA-262, which says that the IEEE-754 64-bit binary floating-point format is used. The significand, M in the above, is often expressed as a value between 1 and 2 with a certain number of bits after a radix point, but I scaled it to an integer for easier analysis, and the exponent bounds are adjusted to match.)
This means all of the numbers in your example will be rounded:
Number
format, the results are:
In contrast the numbers 0.25 or 0.375 are representable. When we multiply 0.25 by 2, we get 0.5 and then 1, so 0.25 = 1•2−2, which matches the format above. And 0.375 produces 0.75, 1.5, and then 3, so 0.375 = 3•2−3, which also matches the format.
Two confounding issues create the illusion that some operations are exact:
Number
value. This comes from step 5 in clause 7.1.12.1 of the ECMAScript 2017 Language Specification..
Number
and then displaying it produces the same number. For example, we have 0.12345
→ 0.123450000000000004174438572590588591992855072021484375 → “0.12345”. The default formatting rule causes any numeral up to 15 digits to be the one produced by displaying the Number
value that results from that numeral.a + b == c
for decimal numerals a
, b
, and c
, the rounding of a + b
happens to coincide with the rounding that occurs for c
. Sometimes it does not.
0.1 + 0.3 == 0.4
, 0.1000000000000000055511151231257827021181583404541015625 and 0.299999999999999988897769753748434595763683319091796875 are added, and the rounded result is 0.40000000000000002220446049250313080847263336181640625. That is the same as the result of 0.4
, so the evaluation reports true even though there were rounding errors.0.1 + 0.2 == 0.3
, 0.1000000000000000055511151231257827021181583404541015625 and 0.200000000000000011102230246251565404236316680908203125 are added, and the rounded result is 0.3000000000000000444089209850062616169452667236328125. That differs from the result for .3
, which is 0.299999999999999988897769753748434595763683319091796875. So the evaluation reports false.The latter result shows us why displaying the result of 0.1 + 0.2
produces “0.30000000000000004”. It is close to 0.3, but 0.299999999999999988897769753748434595763683319091796875 is closer, so, to uniquely distinguish 0.3000000000000000444089209850062616169452667236328125 from that closer value, JavaScript has to use more digits—it produces zeros until it gets to the first non-zero digit, resulting in “ 0.30000000000000004”.
We could ask when will a + b == c
evaluate to true? The mathematics absolutely determines this; a
, b
, and c
are each converted to the nearest representable value, the addition is performed and its result is rounded to the nearest representable value, and then the expression is true if the left and right results are equal. But there is no simple pattern for this. It depends on the patterns the decimal numerals form in binary. You can find various patterns here and there. But, by and large, they are effectively random.
Upvotes: 2
Reputation: 9455
Floating point rounding is basically down to mathematics. It is part of number theory.
I'll first explain it a bit in decimal and then show how it works in binary:
A number like 0.12 is basically "zero + 1 times 1/10 + 2 times 1/10^2", or 12/100. This is a so called "rational" number, a number that can be written as a ration between two integer numbers (1/10 = 0.12, 1/4 = 0.25, 1/2 = 0.5, are all rational numbers). Any non rational number cannot be written as a fraction in decimal (or any numbering system), non rational numbers are like "pi" "e" or square root of 2.
Now can any rational number be written as a terminating fraction?
We also know this isn't the case in decimal: 1/3 cannot be, nor can 1/7. But some can, it turns out there is logic behind this:
Any rational number where the prime factors of the denominator are the same as the prime factors of the base in which the number will be written can be written as a finite floating point. The prime factors of 10 are 2 & 5. So any rational number whose prime factors are only 2 & 5 can be written as a full number in base 10 - or in other words any number that follows x/(2^p * 5^q)
(or any summation of those numbers):
3/8 = 3/(2^3) = 0.375
1/80 = 1/(2^4 * 5^1) = 0.0125
but not:
1/65 = 1/(5^1 * 13^1) = 0.0153846153846...
Now back to floating point on a computer: the floating point unit works in binary, which is a base 2 system. The prime factors of that system are simple "2".
so any number that can be written as x/(2^a)
can be written in a floating point unit without losing accuracy, and any number that is not of that form cannot be written without losing accuracy.
There is however one caveat: the floating point unit also has a limited size for accuracy, this limit the range of numbers further. IEEE 754-2008 notices that double precision numbers have a maximum accuracy "mantissa" of 52 bits, since binary numbers have only a single prime factor anyways, this limits above formula with a <= 52
.
Upvotes: 2
Reputation: 26185
paul23's answer deals with the general principles. This answer analyzes the specific cases in the question.
For each string representing a decimal number, round-to-nearest will result in a specific 64-bit binary IEEE754 number. Here are the mappings for the numbers in the question:
0.1 0.1000000000000000055511151231257827021181583404541015625
0.2 0.200000000000000011102230246251565404236316680908203125
0.3 0.299999999999999988897769753748434595763683319091796875
0.30000000000000004 0.3000000000000000444089209850062616169452667236328125
0.4 0.40000000000000002220446049250313080847263336181640625
On conversion to floating point, both 0.1 and 0.2 rounded up, so their sum will be greater than 0.3. On the other hand, 0.3 rounded down, so the sum is greater than the closest floating point to 0.3. The rounding error in either direction is 2.77555756156289135105907917022705078125E-17, but the round-to-even rule results in rounding up.
When 0.1 and 0.3 were added, the rounding errors on the inputs were in opposite directions. The exact sum was 0.3999999999999999944488848768742172978818416595458984375, which is exactly half way between representable numbers 0.399999999999999966693309261245303787291049957275390625 and 0.40000000000000002220446049250313080847263336181640625. The rounding error is 2.77555756156289135105907917022705078125E-17 either way.
The hex representation of the bit pattern for the larger is 3fd999999999999a, which is even, so that is the way the rounding goes. As it happens, that is also the closest float to 0.4.
Unless you confine yourself to arithmetic on numbers that can all be exactly represented in 64-bit binary floating point it is very hard to predict which calculations will get the float closest to the intended decimal calculation and which will not. If this matters, you are either printing your output with too many decimal places or you need a different data type.
Upvotes: 2