Reputation: 58861
I have a number of indices and values that make up a scipy.coo_matrix
. The indices/values are generated from different subroutines and are concatenate
d together before handed over to the matrix constructor:
import numpy
from scipy import sparse
n = 100000
I0 = range(n)
J0 = range(n)
V0 = numpy.random.rand(n)
I1 = range(n)
J1 = range(n)
V1 = numpy.random.rand(n)
# [...]
I = numpy.concatenate([I0, I1])
J = numpy.concatenate([J0, J1])
V = numpy.concatenate([V0, V1])
matrix = sparse.coo_matrix((V, (I, J)), shape=(n, n))
Now, the components of (I, J, V)
can be quite large such that the concatenate operations become significant. (In the above example it takes over 20% of the runtime on my machine.) I'm reading that it's not possible to concatenate without a copy.
Is there a way for handing over indices and values without copying the input data around first?
Upvotes: 0
Views: 90
Reputation: 231550
If you look at the code for coo_matrix.__init__
you'll see that it's pretty simple. In fact if the (V, (I,J))
inputs are right it will simply assign those 3 arrays to its .data
, row
, col
attributes. You can even check that after creation by comparing those attributes with your variables.
If they aren't 1d arrays of the right dtype, it will massage them - make the arrays, etc. So without getting into details, processing that you do before hand might save time in the coo
call.
self.row = np.array(row, copy=copy, dtype=idx_dtype)
self.col = np.array(col, copy=copy, dtype=idx_dtype)
self.data = np.array(obj, copy=copy)
One way or other those attributes will have to each be a single array, not a loose list of arrays or lists of lists.
sparse.bmat
makes a coo
matrix from other ones. It collected their coo
attributes, joins them in the fill an empty array
styles, and calls coo_matrix
. Look at its code.
Almost all numpy
operations that return a new array do so by allocating an empty
and filling it. Letting numpy do that in compiled code (with np.concatentate
) should be a be a little faster, but details like the size and number of inputs will make a difference.
A non_connonical
coo
matrix is just the start. Many operations require a conversion to one of the other formats.
Efficiently construct FEM/FVM matrix
This is about sparse matrix constrution where there are many duplicate points that need to be summed - and using using the csr
format for calculations.
Upvotes: 1
Reputation: 2690
You can try pre-allocating the arrays. It'll spare you the copy at least. I didn't see any speedup for the example, but you might see a change.
import numpy
from scipy import sparse
n = 100000
I = np.empty(2*n, np.double)
J = np.empty_like(I)
V = np.empty_like(I)
I[:n] = range(n)
J[:n] = range(n)
V[:n] = numpy.random.rand(n)
I[n:] = range(n)
J[n:] = range(n)
V[n:] = numpy.random.rand(n)
matrix = sparse.coo_matrix((V, (I, J)), shape=(n, n))
Upvotes: 0