numpy sum does not agree

Question

I ran a sum over a matrix with small numbers (here: https://gist.github.com/anonymous/7746735) but the sum doesn't agree if I sum in different directions:

>>> a = np.genfromtxt('arr.txt', delimiter=' ')
>>> a.shape
(30, 86)
>>> a.sum()
7.2164496600635175e-16
>>> a.sum(0).sum()
3.8857805861880479e-16
>>> a.sum(1).sum()
7.6327832942979512e-16

But if I sum small matrix, it agrees:

>>> b = np.array([[1,2,3],[4,5,6]])
>>> b.sum()
21
>>> b.sum(0).sum()
21
>>> b.sum(1).sum()
21

What causes this problem? And which sum is correct for the first matrix ? Thanks!

David Heffernan · Accepted Answer

This is due to the inherent imprecision in floating point arithmetic. Mathematically, the associative law holds for addition. Namely:

a + (b + c) = (a + b) + c

But that is not true for floating point arithmetic on a finite machine. And when you sum the elements in a different order, you therefore can get different answers.

You might wonder how it can be that the associative law does not hold for floating point arithmetic. That is all down to the fact that not all numbers are representable. This might come as a surprise to you, but even numbers as apparently simple as 0.1 are not representable exactly with binary floating point data types.

So, when the computer calculates a + b, the true exact result might not be representable. The computer does the best it can and gives you the closest number to the exact value that is exactly representable. And this is where the imprecision arises.

Required reading on this subject: What Every Computer Scientist Should Know About Floating-Point Arithmetic.

Which sum is correct for the first matrix?

That is impossible to say from here, although almost certainly none of them are exactly correct. It's quite likely that the values that you hold in the matrix a are already approximations to the true value so even the definition of what you mean by correct is hard to pin down.

numpy sum does not agree

Answers (1)

Related Questions