Peter
Peter

Reputation: 3165

Can the product of two "whole" doubles always be correctly represented?

This seems like a very basic question but my knowledge of floating point numbers is getting hazy. Say I have two doubles (according to IEEE 754) which both exactly represent a whole number, for example 10 or -1234. If I multiply them together, will the result also always be an exact representation of the "correct" result I would have gotten if both numbers were arbitrary precision integers?

Upvotes: 0

Views: 51

Answers (2)

vinc17
vinc17

Reputation: 3476

The answer is no, but perhaps you would be interested in a simple condition on the values under which this is true. First, multiply each input value by a power of 2 so that you get odd integers M and N. Let r and s be the smallest integers such that |M| < 2r and |N| < 2s (said otherwise, r and s are the number of bits needed to write M and N in binary). Then |MN| < 2r+s. So, if r+s ⩽ 53 (where 53 is the precision of binary64, a.k.a. double precision), then the product of the two values will be correctly represented.

Note that since r and s are minimal, one also has |M| ⩾ 2r−1 and |N| ⩾ 2s−1, so that |MN| ⩾ 2r+s−2. And since MN is an odd integer, you will need at least r+s−1 bits to represent it (and there are no trailing zeros). Therefore, if r+s ⩾ 55, then the product of the two values will not be correctly represented. For r+s = 54, you cannot conclude.

Note also that concerning the r+s ⩽ 53 condition, the integers M and N do not need to be odd, but if they are not odd, you may miss cases where the product is actually correctly represented. That said, if your inputs are smaller integers (e.g. both ⩽ 226 in absolute value), you can immediately conclude that their product will be correctly represented.

Upvotes: 0

Eric Postpischil
Eric Postpischil

Reputation: 222942

No. A trivial proof for any numerical format is to realize that nn is not representable in the format, where n is the greatest whole number representable in the format, provided 1 < n.

For a concrete demonstration:

IEEE-754 binary64 (“double precision”) represents a number as ±F•2e, where F is a number representable with 53 binary digits and e is an integer within certain limits. (Details about scaling of F and its relationship to e and limits on e are not discussed in this answer.) Observe the number of significant digits in a binary numeral for ±F•2e is not affected by e; it is always at most 53 binary digits. (For this purpose, significant digits are those from the first non-zero digit to the last. They do not include zeros that only establish position, such as the zeros in “.0011”.)

Consider 252+1. This is representable with 53 binary digits (a 1 followed by 51 0s followed by a 1). It is a whole number. When multiplied by itself, the product is 2104 + 2•252 + 1 = 2104 + 253 + 1. That number requires 105 significant binary digits to represent, so it cannot be represented in the binary64 format.

Upvotes: 3

Related Questions