Reputation:
Is there a difference between this:
average = (x1+x2)/2;
deviation1 = x1 -average;
deviation2 = x2 -average;
variance = deviation1*deviation1 + deviation2*deviation2;
and this:
average2 = (x1+x2);
deviation1 = 2*x1 -average2;
deviation2 = 2*x2 -average2;
variance = (deviation1*deviation1 + deviation2*deviation2) / 4;
Note that in the second version I am trying to delay division as late as possible. Does the second version [delay divisions] increase accuracy in general?
Snippet above is only intended as an example, I am not trying to optimize this particular snippet.
BTW, I am asking about division in general, not just by 2 or a power of 2 as they reduce to simple shifts in IEEE 754 representation. I took division by 2, just to illustrate the issue using a very simple example.
Upvotes: 9
Views: 2284
Reputation: 612963
There's nothing to be gained from this. You are only changing the scale but you'd don't get any more significant figures in your calculation.
The Wikipedia article on variance explains at a high level some of the options for calculation variance in a robust fashion.
Upvotes: 4
Reputation: 881463
You do not gain precision from this since IEEE754 (which is probably what you're using under the covers) gives you the same precision (number of bits) at whatever scale you're working. For example 3.14159 x 107 will be as precise as 3.14159 x 1010.
The only possible advantage (of the former) is that you may avoid overflow when setting the deviations. But, as long as the values themselves are less than half of the maximum possible, that won't be a problem.
Upvotes: 2
Reputation: 91092
The best way to answer your question would be to run tests (both randomly-distributed and range-based?) and see if the resulting numbers differ at all in the binary representation.
Note that one issue you'll have if you do this is that your functions won't work for value > MAX_INT/2
, because of the way you code average.
avg = (x1+x2)/2 # clobbers numbers > MAX_INT/2
avg = 0.5*x1 + 0.5*x2 # no clobbering
This is almost certainly not an issue though unless you are writing a language-level library. And if most of your numbers are small, it may not matter at all? In fact it probably isn't worth considering, since the value of variance will exceed MAX_INT
since it is inherenty a squared quantity; I'd say you might wish to use standard deviation, but no one does that.
Here I do some experiments in python (which I think supports the IEEE whatever-it-is by virtue of probably delegating math to C libraries...):
>>> def compare(numer, denom):
... assert ((numer/denom)*2).hex()==((2*numer)/denom).hex()
>>> [compare(a,b) for a,b in product(range(1,100),range(1,100))]
No problem, I think because division and multiplication by 2 is nicely representable in binary. However try multiplication and division by 3:
>>> def compare(numer, denom):
... assert ((numer/denom)*3).hex()==((3*numer)/denom).hex(), '...'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 1, in <listcomp>
File "<stdin>", line 2, in compare
AssertionError: 0x1.3333333333334p-1!=0x1.3333333333333p-1
Does it probably matter much? Perhaps if you're working with very small numbers (in which case you may wish to use log arithmetic). However if you're working with large numbers (uncommon in probability) and you delay division, you will as I mentioned risk overflow, but even worse, risk bugs due to hard-to-read code.
Upvotes: 1
Reputation: 3031
I have to agree with David Heffernan, it won't give you a higher precision.
The reason is how float values are stored. You have some bits representing the significant digits and some bits representing the exponent (for example 3.1714x10-12). The bits for the significant digits will always be the same no matter how large your number is - which means in the end the result will not really be a different one.
Even worse - delaying the division can get you an overflow if you have very large numbers.
If you really need a higher precision there are lots of Libraries allowing large numbers or numbers with higher precision.
Upvotes: 1