IEEE Floating Point Number Rounding

Question

For the IEEE floating point number, why do these equal epsilon m?

(1 - 1e-16) - 1 ans = -1.1102e-16

1 + (1e-16 - 1) ans = 1.1102e-16

While, on the other hand, below equals 0

(1 + 1e-16) - 1 ans = 0

Can someone explain to me why? For the last one, I understand that ﬂ(x) = 1 for 1 ≤ x ≤ 1 + M and ﬂ(x) = 1 + 2M for 1 + M < x ≤ 1 + 2M

Shouldn't the first two equal 0 since ﬂ(x) = 1 for 1 - M < x ≤ 1 and ﬂ(x) = 1 - 2M for 1 - 2M ≤ x ≤ 1 - M ?

Simon Byrne · Accepted Answer

Machine epsilon is not really a consistently well-defined term, as different sources and languages define it in different ways.

If we take M to be 2^-53 ≈ 1.11 × 10^-16 (i.e. the gap between 1 and the previous floating point number, or half the gap between 1 and the next floating point number) then the floating point number line near 1 looks something like:

--|----|----|----|---------|---------|------
     1-2M  1-M   1        1+2M      1+4M

So, under standard round to nearest, ties to even, we have

ﬂ(x) = 1-M for 1 - 3M/2 < x < 1 - M/2
ﬂ(x) = 1 for 1 - M/2 ≤ x ≤ 1 + M
ﬂ(x) = 1+2M for 1 + M < x < 1 + 3M

So in the first case, 1 - 1e-16 would round to 1 - M, and in the second case 1 + 1e-16 would round to 1

IEEE Floating Point Number Rounding

Answers (1)

Related Questions