Reputation: 5998
I saved a scipy csr matrix using np.save('X', X)
. When I load it with np.load('X.npy')
, I get this signiture:
array(<240760x110493 sparse matrix of type '<class 'numpy.float64'>'
with 20618831 stored elements in Compressed Sparse Row format>, dtype=object)
However, I cannot access this data using indexes (such as X[0,0]
or X[:10,:10] or X[0]
all give error IndexError: too many indices for array
) and calling .shape
returns ()
.
Is there a way to access this data, or is it corrupt now?
Since there are 3 options to save/load a matrix I ran a speed comparison to see which works the best for my sparse matrix:
%timeit -n1 scipy.io.savemat('tt', {'t': X})
1 loops, best of 3: 66.3 ms per loop
timeit -n1 scipy.io.mmwrite('tt_mm', X)
1 loops, best of 3: 7.55 s per loop
timeit -n1 np.save('tt_np', X)
1 loops, best of 3: 188 ms per loop
timeit -n1 scipy.io.loadmat('tt')
1 loops, best of 3: 9.78 ms per loop
%timeit -n1 scipy.io.mmread('tt_mm')
1 loops, best of 3: 5.72 s per loop
%timeit -n1 np.load('tt_np.npy')
1 loops, best of 3: 150 ms per loop
The results are that mmread/mmwrite
are incredibly low (~100s times slower), and savemat/loadmat
is 3-10 times faster than save/load
.
Upvotes: 3
Views: 1622
Reputation: 231385
Let's pay attention to all the clues in the print
array(<240760x110493 sparse matrix of type '<class 'numpy.float64'>'
with 20618831 stored elements in Compressed Sparse Row format>, dtype=object)
Outermost:
array(....,dtype=object)
A sparse matrix is not a regular array; to np.save
, it is just an Python object. So it wrapped it in a dtype=object
and saved that. It is a 0d array (hence the ()
shape), so all the indexing attempts fail. Try instead
M=arr.item() # or
M=arr[()]
Now M
should display as:
sparse matrix of type '<class 'numpy.float64'>'
with 20618831 stored elements in Compressed Sparse Row format
with attributes like M.shape
. M.A
will display the dense form, to it's too large to do that usefully.
Upvotes: 5