Nico Schlömer
Nico Schlömer

Reputation: 58861

coo_matrix without concatenate

I have a number of indices and values that make up a scipy.coo_matrix. The indices/values are generated from different subroutines and are concatenated together before handed over to the matrix constructor:

import numpy
from scipy import sparse

n = 100000

I0 = range(n)
J0 = range(n)
V0 = numpy.random.rand(n)

I1 = range(n)
J1 = range(n)
V1 = numpy.random.rand(n)

# [...]

I = numpy.concatenate([I0, I1])
J = numpy.concatenate([J0, J1])
V = numpy.concatenate([V0, V1])

matrix = sparse.coo_matrix((V, (I, J)), shape=(n, n))

Now, the components of (I, J, V) can be quite large such that the concatenate operations become significant. (In the above example it takes over 20% of the runtime on my machine.) I'm reading that it's not possible to concatenate without a copy.

Is there a way for handing over indices and values without copying the input data around first?

Upvotes: 0

Views: 90

Answers (2)

hpaulj
hpaulj

Reputation: 231550

If you look at the code for coo_matrix.__init__ you'll see that it's pretty simple. In fact if the (V, (I,J)) inputs are right it will simply assign those 3 arrays to its .data, row, col attributes. You can even check that after creation by comparing those attributes with your variables.

If they aren't 1d arrays of the right dtype, it will massage them - make the arrays, etc. So without getting into details, processing that you do before hand might save time in the coo call.

            self.row = np.array(row, copy=copy, dtype=idx_dtype)
            self.col = np.array(col, copy=copy, dtype=idx_dtype)
            self.data = np.array(obj, copy=copy)

One way or other those attributes will have to each be a single array, not a loose list of arrays or lists of lists.

sparse.bmat makes a coo matrix from other ones. It collected their coo attributes, joins them in the fill an empty array styles, and calls coo_matrix. Look at its code.

Almost all numpy operations that return a new array do so by allocating an empty and filling it. Letting numpy do that in compiled code (with np.concatentate) should be a be a little faster, but details like the size and number of inputs will make a difference.

A non_connonical coo matrix is just the start. Many operations require a conversion to one of the other formats.


Efficiently construct FEM/FVM matrix This is about sparse matrix constrution where there are many duplicate points that need to be summed - and using using the csr format for calculations.

Upvotes: 1

Elliot
Elliot

Reputation: 2690

You can try pre-allocating the arrays. It'll spare you the copy at least. I didn't see any speedup for the example, but you might see a change.

import numpy
from scipy import sparse

n = 100000

I = np.empty(2*n, np.double)
J = np.empty_like(I)
V = np.empty_like(I)
I[:n] = range(n)
J[:n] = range(n)
V[:n] = numpy.random.rand(n)

I[n:] = range(n)
J[n:] = range(n)
V[n:] = numpy.random.rand(n)
matrix = sparse.coo_matrix((V, (I, J)), shape=(n, n))

Upvotes: 0

Related Questions