Reputation: 260
I have two (scipy) CSR sparse matrices:
A (12414693, 235470)
B (235470, 48063)
Performing:
A.dot(B)
causes a segmentation fault.
What am I doing wrong?
EDIT
I've submitted a bug to the scipy developer community: https://github.com/scipy/scipy/issues/3212
Upvotes: 3
Views: 766
Reputation: 67417
Your problem is very likely being caused by an overflow of an index stored in an int32
, caused by the result of your dot product having more than 2^31 non-zero entries. Try the following...
>>> import scipy.sparse
>>> c = np.empty_like(A.indptr)
>>> scipy.sparse.sparsetools.csr_matmat_pass1(A.shape[0], B.shape[1], A.indptr,
A.indices, B.indptr, B.indices, c)
>>> np.all(np.diff(c) >= 0)
With your data, The array c
is a vector of 12414693 + 1
items, holding the accumulated number of non-zero entries per row in the product of your two matrices, i.e. it is what C.indptr
will be if C = A.dot(B)
finishes successfully. It is of type np.int32
, even on 64 bit platforms, which is not good. If your sparse matrix is too large, there will be an overflow, that last line will return False
, and the arrays to store the result of the matrix product will be instantiated with a wrong size (the last item of c
, which is very likely to be a negative number if an overflow did happen). If that's the case, then yes, file a bug report...
Upvotes: 4
Reputation: 77404
This link may be helpful: < http://blog.newsle.com/2013/02/01/text-classification-and-feature-hashing-sparse-matrix-vector-multiplication-in-cython/ >. The product of these will be too large. I'm not sure if the advice of the article applies for you, but you may try organizing the second matrix as a CSC type.
Upvotes: 2