Reputation: 105497
I have two binary fraction numbers that I want to add:
1.100110011001100110011001100110011001100110011001101 x 2
-4
and
11.001100110011001100110011001100110011001100110011010 x 2
-4
If I simply add them, it seems to result into overflow (54 bits):
1.100110011001100110011001100110011001100110011001101
+ 11.001100110011001100110011001100110011001100110011010
-----------------------------------------------------
100.110011001100110011001100110011001100110011001100111
How do I handle that if I still need to store it as double precision 52 bit mantissa?
Upvotes: 0
Views: 677
Reputation: 26185
The next step after the addition is to adjust the exponent so that the leading significant bit is immediately before the binary point. In this case, you will need to add two to the exponent.
The new significand is 1.00110011001100110011001100110011001100110011001100111
Now round to 53 bits, dropping the final 11 and adjusting according to the rounding mode. If round-to-nearest, you will need to round up.
Upvotes: 2