Reputation: 65
I have a huge sparse matrix in Scipy and I would like to replace numerous elements inside by a given value (let's say -1
).
Is there a more efficient way to do it than using:
SM[[rows],[columns]]=-1
Here is an example:
Nr=seg.shape[0] #size ~=50000
Im1=sparse.csr_matrix(np.append(np.array([-1]),np.zeros([1,Nr-1])))
Im1=sparse.csr_matrix(sparse.vstack([Im1,sparse.eye(Nr)]))
Im1[prev[1::]-1,Num[1::]-1]=-1 # this line is very slow
Im2=sparse.vstack([sparse.csr_matrix(np.zeros([1,Nr])),sparse.eye(Nr)])
IM=sparse.hstack([Im1,Im2]) #final result
Upvotes: 1
Views: 2325
Reputation: 231665
I've played around with your sparse
arrays. I'd encourage you to do some timings on smaller sizes, to see how different methods and sparse types behave. I like to use timeit
in Ipython
.
Nr=10 # seg.shape[0] #size ~=50000
Im2=sparse.vstack([sparse.csr_matrix(np.zeros([1,Nr])),sparse.eye(Nr)])
Im2
has a zero first row, and offset diagonal on the rest. So it's simpler, though not much faster, to start with an empty sparse matrix:
X = sparse.vstack([sparse.csr_matrix((1,Nr)),sparse.eye(Nr)])
Or use diags
to construct the offset diagonal directly:
X = sparse.diags([1],[-1],shape=(Nr+1, Nr))
Im1
is similar, except it has a -1
in the (0,0)
slot. How about stacking 2 diagonal matrices?
X = sparse.vstack([sparse.diags([-1],[0],(1,Nr)),sparse.eye(Nr)])
Or make the offset diagonal (copy Im2
?), and modify [0,0]
. A csr
matrix gives an efficiency warning, recommending the use of lil
format. It does, though, take some time to convert tolil()
.
X = sparse.diags([1],[-1],shape=(Nr+1, Nr)).tolil()
X[0,0] = -1 # slow warning with csr
Let's try your larger insertions:
prev = np.arange(Nr-2) # what are these like?
Num = np.arange(Nr-2)
Im1[prev[1::]-1,Num[1::]-1]=-1
With Nr=10
, and various Im1
formats:
lil - 267 us
csr - 1.44 ms
coo - not supported
todense - 25 us
OK, I've picked prev
and Num
such that I end up modifying diagonals of Im1
. In this case it would be faster to construct those diagonals right from the start.
X2=Im1.todia()
print X2.data
[[ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
[-1. -1. -1. -1. -1. -1. -1. 0. 0. 0.]]
print X2.offsets
[-1 0]
You may have to learn how various sparse formats are stored. csr
and csc
are a bit complex, designed for fast linear algebra operations. lil
, dia
, coo
are simpler to understand.
Upvotes: 1