Reputation: 11
I am getting segmentation fault when I multiply a scipy sparse matrix by its transpose. I've searched all over the Internet but could not find any answer. Any help is appreciated.
>>> import cPickle
>>> fs = open('vec.pickle', 'rb')
>>> vec = cPickle.load(fs)
>>> vec
<3020x512 sparse matrix of type '<type 'numpy.float64'>' with 26008 stored elements in Compressed Sparse Column format>
>>> vec.max()
10.0
>>> vec.min()
0.0
>>> vec * vec.T
Segmentation fault: 11
I do not think this is memory issue since the dimension is small. The vec object is created by gensim, if that information helps.
I also do not think this is overflow issues since the range of element is [0.0, 10.0]
The pickle object is here: https://drive.google.com/open?id=0B3DJbsn85XMvdmFYT0MzZVFjOVU
Upvotes: 1
Views: 938
Reputation: 231475
When I load this vec
and
In [13]: vec.tocoo()
ValueError Traceback (most recent call ....
226 if self.col.max() >= self.shape[1]:
227 raise ValueError('column index exceedes matrix dimensions')
ValueError: row index exceedes matrix dimensions
So something is faulty in the pickled object.
In [38]: vec
Out[38]:
<3020x512 sparse matrix of type '<type 'numpy.float64'>'
with 26008 stored elements in Compressed Sparse Column format>
In [37]: vec.indices.max()
Out[37]: 3255
by the shape, it's supposed to have 3020 rows, 512 columns. But the indices
attribute gets up to 3255, larger than the number of rows.
So one question is, can we recover a valid matrix from this data? And another, was this valid when originally pickled. It's more likely a fault in gensim
than in scipy.sparse
.
Until it is valid with simple tests like this, I wouldn't jump to any conclusions about the vec*vec.T
calculation.
I can create a new, valid sparse matrix with:
In [44]: newvec = sparse.csc_matrix((vec.data,vec.indices,vec.indptr))
In [45]: newvec.shape
Out[45]: (3256, 512)
In [46]: newvec * newvec.T
Out[46]:
<3256x3256 sparse matrix of type '<type 'numpy.float64'>'
with 314081 stored elements in Compressed Sparse Column format>
In [47]: newvec.tocoo()
Out[47]:
<3256x512 sparse matrix of type '<type 'numpy.float64'>'
with 26008 stored elements in COOrdinate format>
My guess is that the segment fault occurs in the compiled matrix multiplication. At some point the vec.indices
references some vallue beyond the space allocated to C array. For the sake of speed, the C code is not checking bounds as thoroughly as normal Python and numpy code does. In effect matrix multiplication assumes its inputs are well formed.
Upvotes: 2