Omer
Omer

Reputation: 260

Dot product of csr_matrix causes segmentation fault

I have two (scipy) CSR sparse matrices:

A (12414693, 235470)
B (235470, 48063)

Performing:

A.dot(B)

causes a segmentation fault.

What am I doing wrong?

EDIT

I've submitted a bug to the scipy developer community: https://github.com/scipy/scipy/issues/3212

Upvotes: 3

Views: 766

Answers (2)

Jaime
Jaime

Reputation: 67417

Your problem is very likely being caused by an overflow of an index stored in an int32, caused by the result of your dot product having more than 2^31 non-zero entries. Try the following...

>>> import scipy.sparse
>>> c = np.empty_like(A.indptr)
>>> scipy.sparse.sparsetools.csr_matmat_pass1(A.shape[0], B.shape[1], A.indptr,
                                              A.indices, B.indptr, B.indices, c)
>>> np.all(np.diff(c) >= 0)

With your data, The array c is a vector of 12414693 + 1 items, holding the accumulated number of non-zero entries per row in the product of your two matrices, i.e. it is what C.indptr will be if C = A.dot(B) finishes successfully. It is of type np.int32, even on 64 bit platforms, which is not good. If your sparse matrix is too large, there will be an overflow, that last line will return False, and the arrays to store the result of the matrix product will be instantiated with a wrong size (the last item of c, which is very likely to be a negative number if an overflow did happen). If that's the case, then yes, file a bug report...

Upvotes: 4

ely
ely

Reputation: 77404

This link may be helpful: < http://blog.newsle.com/2013/02/01/text-classification-and-feature-hashing-sparse-matrix-vector-multiplication-in-cython/ >. The product of these will be too large. I'm not sure if the advice of the article applies for you, but you may try organizing the second matrix as a CSC type.

Upvotes: 2

Related Questions