Salmon
Salmon

Reputation: 106

Getting different results when summing a list of arrays with sum, np.sum and fsum?

I'm trying to get an average histogram from my list of histograms in nparry form. I'm playing around with different methods and getting very different results for sum, np.sum and fsum. I'm not sure why or which one is 'correct'. histogram sum results

Upvotes: 1

Views: 826

Answers (1)

MSeifert
MSeifert

Reputation: 152677

The problem with summation of doubles is that doubles have limited precision and especially if you sum values that differ a lot in magnitude you might get different results. The Wikipedia article on Kahan summation algorithm is worth reading if you're really interested (or see this link for some alternative implementations).

  • math.fsum will probably the most correct result when summing doubles. It's slower than the other approaches though.
  • numpy.sum isn't that good. It currently uses pairwise summation which is a bit better than a naive implementation and it's quite fast. However the result might not be entirely accurate.
  • sum is just a naive summation. It's often faster than fsum but it's the worst of the three approaches when it comes to precision.

Floats have limited precision which in a lot of cases makes them unsuitable for perfectly accurate results anyway (if you need perfectly accurate results you'll need to use Decimal or Fraction).

However the limited precision of intermediate results is another source of error that can totally skew the result of a (naive) summation:

>>> import numpy as np
>>> import math
>>> a = [1, 1e20, 1, -1e20]  # the 1e20 and -1e20 cancel each other.
>>> sum(a)
0.0
>>> np.sum(a)
0.0
>>> math.fsum(a)
2.0

In this case only math.fsum gives the expected result of 2.

Upvotes: 4

Related Questions