Reputation: 522

Excel - why two equal integers have non-zero difference?

I have =2*10^15 in one cell which evaluates to 2,000,000,000,000,000 and I have =2+2*10^15 in another cell which also evaluates to 2,000,000,000,000,000, which is fine since the precision is only 15 digits. But when I then subtract one from another, I get back 2, which I dont understand where it came from. As far as I can tell 2*10^15has an exact binary representation, so I would have thought that when adding 2 to it, it just "gets dropped". But it seems that my understanding is incorrect.

Actually, as far as I can tell, the binary representation of 2E+15 has 1 in places corresponding to powers of 2 equal to {50,49,48,44,43,41,39,38,37,36,35,34,32,30,27,24,23,19,18,16}, then why can't we just add another bit in place correspondign to power 1 and have an exact representation of 2+2E+15?

Furthermore, if instead of 2 I add 1, then the difference between 2*10^15 and 1+2*10^15 is 0 in Excel spreadsheet, but if I do the same calculation in Excel VBA, then I get back 1. What is that about?

I have read (several times) "What Every Computer Scientist Should Know About Floating-Point Arithmetic" but the discussion there seems to be focused on decimals rather than integers or I am just not getting it.

Add 1

I remembered that a while ago I wrote a function in VBA to get the binary representation of a number stored by Excel (the function could be found here Extract mantissa, exponent and sign data from IEEE-754 double in VBA). So if I apply that function to 2 + 2E+15 it does return a binary with 1 in the 4th last place in mantissa (not sure why 4th, though) which is not there when the function is applied to 2E+15, so it does seem that Excel simply choses to display number with 15 digits of precision even though the underlying stored floating number has a larger precision.

Add 2

It turns out that the reason that the extra bit in the mantissa of 2 + 2E+15 is in the 4th place (instead of 2nd) is that the mantissa gets "normalised" by dropping the leading (leftmost) bit (since it is always 1) and adjusting the length of the mantissa to 52 bits by shifting everything left which adds 2 0s at the end to the "simple" binary representation so that 1110-0011-0101-1111-1010-1001-0011-0001-1010-0000-0000-0000-010 becomes 1100-0110-1011-1111-0101-0010-0110-0011-0100-0000-0000-0000-1000.

It seems that due to this normalisation of the mantissa, the precision in the floating point double representation is actually 18 decimal places and, for example, the number 1.75 + 2E+15 can be represented exactly by mantissa 1100-0110-1011-1111-0101-0010-0110-0011-0100-0000-0000-0000-0111.

Just need to figure out what is going on with the exponent... There is something called "biasing" being done to it...

Upvotes: 2

Answers (2)

John F

Reputation: 276

Excel stores numbers as double precision, which uses 64 bits in total. One bit is the sign, 11 bits are the exponent from -1024 to 1024 (in base 2) and the remaining 52 bits are for the fraction.

Using this method, the most number of significant figures that a number can be converted to a double precision and back again and GUARANTEED to be unchanged is 15. For 16 and 17 significant figures, this only holds for some numbers. I assume that this is why excel will only show data in the worksheet to 15 significant figures. Note that according to the specs, Excel will STORE values as a result of a formula to higher precision than it DISPLAYS; [https://support.office.com/en-us/article/excel-specifications-and-limits-1672b34d-7043-467e-8e27-269d656771c3]

Now in theory we should be able to exactly represent integers to 2^53 = 9,007,199,254,740,992. However, it appears Excel only stores integers exactly up to 2^50 = 1,125,899,906,842,624. Try subtracting this number from one plus this number and you get zero. Anything less than this number, you get one as expected. Then try 2^ 51 = 2,251,799,813,685,248, and subtract this number from 2 or 3 plus this number, you get zero. However if you subtract this number from 4 plus this number, you get 4. And so on.

Note that your value of 2E+15 lies between these values, so hence a difference of 1 won't register, but any greater difference will.

This article explains this explicitly: https://en.wikipedia.org/wiki/Double-precision_floating-point_format

Upvotes: 0

Samuel Hulla

Reputation: 7099

So, I did some research, because I was intrigued by this question. While I can't quite guarantee you this answer is absolutely correct, it is most likely the closest we will get to a correct answer (unless some expert steps in).

I'll try to dissect the question:

I have =2*10^15 in one cell which evaluates to 2,000,000,000,000,000 and I have =2+2*10^15 in another cell which also evaluates to 2,000,000,000,000,000, which is fine since the precision is only 15 digits. But when I then subtract one from another, I get back 2, which I dont understand where it came from. As far as I can tell 2*10^15has an exact binary representation, so I would have thought that when adding 2 to it, it just "gets dropped". But it seems that my understanding is incorrect.

I think before we continue with anything, I think it's important to state, that Excel (or other MS products in general) are notoriously underwhelming when it comes to float-point precision. Some inaccuracies have to be accounted for here.

Also, this is actually one of those things, where the display is misleading. Where the number that is displayed on your screen is actually not the same number that you are calculating with. What I mean under that, in your case, the 2E+15 and 2E+15 albeit displayed the same are not necessarily the same number! (as weird as it sounds..) They are just displayed as the same number, because the floating e^th position is so small, that excel considers it to be insignificant to be displayed

Furthermore, if instead of 2 I add 1, then the difference between 2*10^15 and 1+2*10^15 is 0 in Excel spreadsheet, but if I do the same calculation in Excel VBA, then I get back 1. What is that about?

This is the trickiest part to answer. While I can't say this with certainty, my guess is, that is because the 1 at the end of the float-point arithmetic on the number 1 is rounded down to zero, while the 2 is left as it is. In general, this seems like one of those weird Excel float things, that are hard to explain without expertise and that just happen.

I have read (several times) "What Every Computer Scientist Should Know About Floating-Point Arithmetic" but the discussion there seems to be focused on decimals rather than integers or I am just not getting it.

You already are working with decimals (doubles to be precise). The "Number" format in Excel automatically converts your number to a data-type depending on your need. You are however absolutely not working with integers

Integers are defined in a range of -2,147,483,648 to 2,147,483,647. That means, our number far exceeds the max integer range. If we were to convert both numbers to integers, both would be floored to the number 2,147,483,647, ending up with 0 difference Closest representation of this I can reproduce is using the =FLOOR.MATH() formula, which rounds down to nearest multiple.

Last but not least, it's important to understand how the mathematical functions in excel are calculated. If you have a formula eg. 2+2*10^-15+2 the result calculated would be 4. Why? Operator precedence: Where * has precedence over +, and ^ over *

So that ^ means, if we calculated 2+2*10^15 - 2*10^15 is first concatenated together as a mathematical formula and calculated. That would result to x.......2 - x........0, albeit on our excel display it would show as 2E+15 - 2E+15

I created the following table for illustration:

Also note, using the =FLOOR.MATH() formula, we converted the doubles to the closest multiples [not integers!] (if we used =INT() we would only receive #VALUE error for exceeding integer range). The difference between =FLOOR.MATH(N1) - FLOOR.MATH(N2) is 0, although N2 - N1 is not, because we are counting with different numbers and precision. While this is not precisely the equivalent of working with integers (since they are not... obviously), they are the closest representation I can think of what would happen if we were to calculate with them.

One thing that however baffles me, the =FLOOR.MATH() function, of N2 seems to return with .00 decimal, while N1 doesn't. Albeit, both should be precisely same numbers (closest multiple). Not sure if this is just a formatting glitch or anythign of significance.

So to sum it up, I know this answer rather raises more questions than answers, but hopefully it will be a good lift-off ramp for somebody else to answer this question for good and at least answered some of your questions if not all of them.

Upvotes: 1

Excel - why two equal integers have non-zero difference?

Answers (2)

Related Questions