Jose
Jose

Reputation: 2129

Looping over large sparse array

Let's say I have a (sparse) matrix M size (N*N, N*N). I want to select elements from this matrix where the outer product of grid (a (n,m) array, where n*m=N) is True (it is a boolean 2D array, and na=grid.sum()). This can be done as follows

result = M[np.outer( grid.flatten(),grid.flatten() )].reshape (( N, N ) )

result is an (na,na) sparse array (and na < N). The previous line is what I want to achieve: get the elements of M that are true from the product of grid, and squeeze the ones that aren't true out of the array.

As n and m (and hence N) grow, and M and result are sparse matrices, I am not able to do this efficiently in terms of memory or speed. Closest I have tried is:

result = sp.lil_matrix ( (1, N*N), dtype=np.float32 )
# Calculate outer product
A = np.einsum("i,j", grid.flatten(), grid.flatten())  
cntr = 0
it = np.nditer ( A, flags=['multi_index'] )
while not it.finished:
    if it[0]:
        result[0,cntr] = M[it.multi_index[0], it.multi_index[1]]
        cntr += 1
# reshape result to be a N*N sparse matrix

The last reshape could be done by this approach, but I haven't got there yet, as the while loop is taking forever.

I have also tried selecting nonzero elements of A too, and looping over but this eats up all of the memory:

A=np.einsum("i,j", grid.flatten(), grid.flatten())  
nzero = A.nonzero() # This eats lots of memory
cntr = 0
for (i,j) in zip (*nzero):
    temp_mat[0,cntr] = M[i,j]
    cnt += 1

'n' and 'm' in the example above are around 300.

Upvotes: 0

Views: 321

Answers (1)

hpaulj
hpaulj

Reputation: 231385

I don't know if it was a typo, or code error, but your example is missing an iternext:

R=[]
it = np.nditer ( A, flags=['multi_index'] )
while not it.finished:
    if it[0]:
        R.append(M[it.multi_index])
    it.iternext()

I think appending to a list is simpler and faster than R[ctnr]=.... It's competitive if R is a regular array, and sparse indexing is slower (even the fastest lil format).

ndindex wraps this use of a nditer as:

R=[]
for index in np.ndindex(A.shape):
    if A[index]:
        R.append(M[index])

ndenumerate also works:

R = []
for index,a in np.ndenumerate(A):
   if a:
       R.append(M[index])

But I wonder if you really want to advance the cntr each it step, not just the True cases. Otherwise reshaping result to (N,N) doesn't make much sense. But in that case, isn't your problem just

M[:N, :N].multiply(A)

or if M was a dense array:

M[:N, :N]*A

In fact if both M and A are sparse, then the .data attribute of that multiply will be the same as the R list.

In [76]: N=4
In [77]: M=np.arange(N*N*N*N).reshape(N*N,N*N)
In [80]: a=np.array([0,1,0,1])
In [81]: A=np.einsum('i,j',a,a)
In [82]: A
Out[82]: 
array([[0, 0, 0, 0],
       [0, 1, 0, 1],
       [0, 0, 0, 0],
       [0, 1, 0, 1]])

In [83]: M[:N, :N]*A
Out[83]: 
array([[ 0,  0,  0,  0],
       [ 0, 17,  0, 19],
       [ 0,  0,  0,  0],
       [ 0, 49,  0, 51]])

In [84]: c=sparse.csr_matrix(M)[:N,:N].multiply(sparse.csr_matrix(A))
In [85]: c.data
Out[85]: array([17, 19, 49, 51], dtype=int32)

In [89]: [M[index] for index, a in np.ndenumerate(A) if a]
Out[89]: [17, 19, 49, 51]

Upvotes: 1

Related Questions