Reputation: 106
I'm trying to get an average histogram from my list of histograms in nparry form. I'm playing around with different methods and getting very different results for sum, np.sum and fsum. I'm not sure why or which one is 'correct'. histogram sum results
Upvotes: 1
Views: 826
Reputation: 152677
The problem with summation of doubles is that doubles have limited precision and especially if you sum values that differ a lot in magnitude you might get different results. The Wikipedia article on Kahan summation algorithm is worth reading if you're really interested (or see this link for some alternative implementations).
math.fsum
will probably the most correct result when summing doubles. It's slower than the other approaches though.numpy.sum
isn't that good. It currently uses pairwise summation which is a bit better than a naive implementation and it's quite fast. However the result might not be entirely accurate.sum
is just a naive summation. It's often faster than fsum
but it's the worst of the three approaches when it comes to precision.Floats have limited precision which in a lot of cases makes them unsuitable for perfectly accurate results anyway (if you need perfectly accurate results you'll need to use Decimal
or Fraction
).
However the limited precision of intermediate results is another source of error that can totally skew the result of a (naive) summation:
>>> import numpy as np
>>> import math
>>> a = [1, 1e20, 1, -1e20] # the 1e20 and -1e20 cancel each other.
>>> sum(a)
0.0
>>> np.sum(a)
0.0
>>> math.fsum(a)
2.0
In this case only math.fsum
gives the expected result of 2
.
Upvotes: 4