Reputation: 875
In IEEE 754 there is a "Round to Nearest" method of rounding floating point values.
But I do not understand one item in that definition:
If the two nearest representable values are equally near, the one with its least significant bit zero is chosen
What is "least significant bit zero is chosen"
Upvotes: 3
Views: 5747
Reputation: 20372
It simply means that tie-breakers are resolved by rounding to even. It is also known as Banker's rounding. For example, 3.5 is rounded to 4.0, but 4.5 is rounded to 4.0. This also affects numbers that are too large to be exactly represented. For example, in 32-bit floating point the integer 16777219 is rounded to 16777220.0 and not 16777218.0 because the latter's representation ends with a one.
Upvotes: 1
Reputation: 875
It looks like I understood the issue. Single and Double precission numbers can be represented as 32 and 64 sequence of bits with the following way:
b bbbbbbbb bbbbbbbbbbbbbbbbbbbbbbb
b bbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
Here b is zero or one. First group corresponds to sign of a number. Second group corresponds to exponent of a number and consist of 8 (single precission) and 11 (double precision) bits. Third group corresponds to mantissa of a number and consist of 23 (single precission) and 52 (double precision) bits.
Hence, the least significant bit
of a number is 23d bit of mantissa for single precission number and 52d bit of mantissa for double precission number. This is the rightmost bit of a number. If this bit is zero it will be chosen.
Note:
Even and odd numbers are defined only for integer values.
Hence, if rounding function rounds numbers only to integer values this rule degenerates to round-to-even rule
Thanks to everyone for your efforts.
Upvotes: 2
Reputation: 80276
The best way to play with the round-to-even rule is to round double-precision numbers written in hexadecimal to single-precision, for instance in the C99 or Java programming languages.
Single-precision has 23 explicit binary digits, so that the numbers 0x1.000000p0, 0x1.000002p0, 0x1.000004p0, … are single-precision numbers, but the numbers in-between aren't.
When a value is exactly in-between two consecutive single-precision floating-point numbers l and u, the binary expansions of l and u differ in the the 23 bit after the dot in the notation 1.bbbbbbbbbbbbbbbbbbbbbbbbb * 2exp. This is a simple consequence of l and u being consecutive.
The double-precision numbers 0x1.000001p0, 0x1.000003p0, 0x1.000005p0, … are exactly in-between two single-precision numbers are need to be rounded according to the “least significant bit zero” rule.
Example C99 program:
#include <stdio.h>
#include <stdlib.h>
int main(int c, char *v[]) {
double d = 0x1.000001p0;
for (int i = 0; i < 10; i++) {
printf("double-precision:%.6a\n"
"single-precision:%.6a\n\n",
d, (float) d);
d += 0x0.000002p0;
}
}
Results illustrating how rounding goes to the single-precision value with a 0 as 23d binary digit after the dot:
double-precision:0x1.000001p+0 single-precision:0x1.000000p+0 double-precision:0x1.000003p+0 single-precision:0x1.000004p+0 double-precision:0x1.000005p+0 single-precision:0x1.000004p+0 double-precision:0x1.000007p+0 single-precision:0x1.000008p+0 double-precision:0x1.000009p+0 single-precision:0x1.000008p+0 double-precision:0x1.00000bp+0 single-precision:0x1.00000cp+0 double-precision:0x1.00000dp+0 single-precision:0x1.00000cp+0 double-precision:0x1.00000fp+0 single-precision:0x1.000010p+0 double-precision:0x1.000011p+0 single-precision:0x1.000010p+0 double-precision:0x1.000013p+0 single-precision:0x1.000014p+0
Upvotes: 1