Reputation: 135
Ubuntu16.04_64bit + Python3.5.2 + numpy1.13.3 + scipy1.0.0
I've got this problem when I'm dealing with the matrix multiplication between a scipy.sparse.csc.csc_matrix
and an numpy.ndarray
. I will give out an example here:
import numpy as np
import scipy.sparse
a = np.random.random(1000,1000)
b = np.random.random(1000,2000)
da = scipy.sparse.csc.csc_matrix(a)
db = scipy.sparse.csc.csc_matrix(b)
ab = a.dot(b)
dadb = da.dot(db)
dab = da.dot(b)
then the difference looks like this:
In [31]: np.sum(dadb.toarray() != ab)
Out[31]: 1869078
In [33]: np.sum(dab != dadb.toarray())
Out[33]: 0
In [34]: np.sum(dab != ab)
Out[34]: 1869078
Why? What makes the difference between them? What to do with it?
Upvotes: 2
Views: 176
Reputation: 86533
What you are seeing is typical of floating point arithmetic (for a great explanation, see What Every Computer Scientist Should Know About Floating-Point Arithmetic or the answers to Why Are Floating Point Numbers Inaccurate?). Unlike real arithmetic, the order of operations in floating point arithmetic will (slightly) change the results, because rounding errors accumulate in different ways. What this means is that different ways of computing the same result cannot be expected to agree exactly, but they will agree approximately.
You can see this if you use np.allclose
instead of using exact equality:
>>> np.allclose(dab, ab)
True
>>> np.allclose(dadb.toarray(), ab)
True
In short, these operations are behaving as expected.
Upvotes: 5