rocknrollnerd
rocknrollnerd

Reputation: 2634

Dot product of two sparse matrices affecting zero values only

I'm trying to compute a simple dot product but leave nonzero values from the original matrix unchanged. A toy example:

import numpy as np

A = np.array([[2, 1, 1, 2],
              [0, 2, 1, 0],
              [1, 0, 1, 1],
              [2, 2, 1, 0]])
B = np.array([[ 0.54331039,  0.41018682,  0.1582158 ,  0.3486124 ],
              [ 0.68804647,  0.29520239,  0.40654206,  0.20473451],
              [ 0.69857579,  0.38958572,  0.30361365,  0.32256483],
              [ 0.46195299,  0.79863505,  0.22431876,  0.59054473]])

Desired outcome:

C = np.array([[ 2.        ,  1.        ,  1.        ,  2.        ],
              [ 2.07466874,  2.        ,  1.        ,  0.73203386],
              [ 1.        ,  1.5984076 ,  1.        ,  1.        ],
              [ 2.        ,  2.        ,  1.        ,  1.42925865]])

The actual matrices in question, however, are sparse and look more like this:

A = sparse.rand(250000, 1700, density=0.001, format='csr')
B = sparse.rand(1700, 1700, density=0.02, format='csr')

One simple way go would be just setting the values using mask index, like that:

mask = A != 0
C = A.dot(B)
C[mask] = A[mask]

However, my original arrays are sparse and quite large, so changing them via index assignment is painfully slow. Conversion to lil matrix helps a bit, but again, conversion itself takes a lot of time.

The other obvious approach, I guess, would be just resort to iteration and skip masked values, but I'd like not to throw away the benefits of numpy/scipy-optimized array multiplication.

Some clarifications: I'm actually interested in some kind of special case, where B is always square, and therefore, A and C are of the same shape. So if there's a solution that doesn't work on arbitrary arrays but fits in my case, that's fine.

UPDATE: Some attempts:

from scipy import sparse
import numpy as np

def naive(A, B):
    mask = A != 0
    out = A.dot(B).tolil()
    out[mask] = A[mask]
    return out.tocsr()


def proposed(A, B):
    Az = A == 0
    R, C = np.where(Az)
    out = A.copy()
    out[Az] = np.einsum('ij,ji->i', A[R], B[:, C])
    return out


%timeit naive(A, B)
1 loops, best of 3: 4.04 s per loop

%timeit proposed(A, B)
/usr/local/lib/python2.7/dist-packages/scipy/sparse/compressed.py:215: SparseEfficiencyWarning: Comparing a sparse matrix with 0 using == is inefficient, try using != instead.

/usr/local/lib/python2.7/dist-packages/scipy/sparse/coo.pyc in __init__(self, arg1, shape, dtype, copy)
    173                     self.shape = M.shape
    174 
--> 175                 self.row, self.col = M.nonzero()
    176                 self.data = M[self.row, self.col]
    177                 self.has_canonical_format = True

MemoryError: 

ANOTHER UPDATE:

Couldn't make anything more or less useful out of Cython, at least without going too far away from Python. The idea was to leave the dot product to scipy and just try to set those original values as fast as possible, something like this:

cimport cython


@cython.cdivision(True)
@cython.boundscheck(False)
@cython.wraparound(False)
cpdef coo_replace(int [:] row1, int [:] col1, float [:] data1, int[:] row2, int[:] col2, float[:] data2):
    cdef int N = row1.shape[0]
    cdef int M = row2.shape[0]
    cdef int i, j
    cdef dict d = {}

    for i in range(M):
        d[(row2[i], col2[i])] = data2[i]

    for j in range(N):
        if (row1[j], col1[j]) in d:
            data1[j] = d[(row1[j], col1[j])]

This was a bit better then my pre-first "naive" implementation (using .tolil()), but following hpaulj's approach, lil can be thrown out. Maybe replacing python dict with something like std::map would help.

Upvotes: 7

Views: 1565

Answers (3)

Tasos Papastylianou
Tasos Papastylianou

Reputation: 22215

Python isn't my main language, but I thought this was an interesting problem and I wanted to give this a stab :)

Preliminaries:

import numpy
import scipy.sparse
# example matrices and sparse versions
A = numpy.array([[1, 2, 0, 1], [1, 0, 1, 2], [0, 1, 2 ,1], [1, 2, 1, 0]])
B = numpy.array([[1,2,3,4],[1,2,3,4],[1,2,3,4],[1,2,3,4]])
A_s = scipy.sparse.lil_matrix(A)
B_s = scipy.sparse.lil_matrix(B)

So you want to convert the original problem of:

C = A.dot(B)
C[A.nonzero()] = A[A.nonzero()]

To something sparse-y. Just to get this out of the way, the direct "sparse" translation of the above is:

C_s = A_s.dot(B_s)
C_s[A_s.nonzero()] = A_s[A_s.nonzero()]

But it sounds like you're not happy about this, as it calculates all the dot products first, which you worry might be inefficient.

So, your question is, if you find the zeros first, and only evaluate dot products on those elements, will that be faster? I.e. for a dense matrix this could be something like:

Xs, Ys = numpy.nonzero(A==0)
D = A[:]
D[Xs, Ys] = map ( lambda x,y: A[x,:].dot(B[:,y]), Xs, Ys)

Let's translate this to a sparse matrix. My main stumbling block here was finding the "Zero" indices; since A_s==0 doesn't make sense for sparse matrices, I found them this way:

Xmax, Ymax = A_s.shape
DenseSize = Xmax * Ymax
Xgrid, Ygrid = numpy.mgrid[0:Xmax, 0:Ymax]
Ygrid = Ygrid.reshape([DenseSize,1])[:,0]
Xgrid = Xgrid.reshape([DenseSize,1])[:,0]
AllIndices = numpy.array([Xgrid, Ygrid])
NonzeroIndices = numpy.array(A_s.nonzero())
ZeroIndices = numpy.array([x for x in AllIndices.T.tolist() if x not in NonzeroIndices.T.tolist()]).T

If you know of a better / faster way, by all means try it. Once we have the Zero indices, we can do a similar mapping as before:

D_s = A_s[:]
D_s[ZeroIndices[0], ZeroIndices[1]] = map ( lambda x, y : A_s[x,:].dot(B[:,y])[0], ZeroIndices[0], ZeroIndices[1] )

which gives you your sparse matrix result.

Now I don't know if this is faster or not. I mostly took a stab because it was an interesting problem, and to see if I could do it in python. In fact I suspect it might not be faster than direct whole-matrix dotproduct, because it uses listcomprehensions and mapping on a large dataset (like you say, you expect a lot of zeros). But it is an answer to your question of "how can I only calculate dot products for the zero values without doing multiplying the matrices as a whole". I'd be interested to see if you do try this how it compares in terms of speed on your datasets.

EDIT: I'm putting below an example "block processing" version based on the above, which I think should allow you to process your large dataset without problems. Let me know if it works.

import numpy
import scipy.sparse
# example matrices and sparse versions
A = numpy.array([[1, 2, 0, 1], [1, 0, 1, 2], [0, 1, 2 ,1], [1, 2, 1, 0]])
B = numpy.array([[1,2,3,4],[1,2,3,4],[1,2,3,4],[1,2,3,4]])
A_s = scipy.sparse.lil_matrix(A)
B_s = scipy.sparse.lil_matrix(B)

# Choose a grid division (i.e. how many processing blocks you want to create)
BlockGrid = numpy.array([2,2])

D_s = A_s[:] # initialise from A

Xmax, Ymax = A_s.shape
BaseBSiz = numpy.array([Xmax, Ymax]) / BlockGrid
for BIndX in range(0, Xmax, BlockGrid[0]):
  for BIndY in range(0, Ymax, BlockGrid[1]):
    BSizX, BSizY = D_s[ BIndX : BIndX + BaseBSiz[0], BIndY : BIndY + BaseBSiz[1] ].shape
    Xgrid, Ygrid = numpy.mgrid[BIndX : BIndX + BSizX, BIndY : BIndY + BSizY]
    Xgrid = Xgrid.reshape([BSizX*BSizY,1])[:,0]
    Ygrid = Ygrid.reshape([BSizX*BSizY,1])[:,0]
    AllInd = numpy.array([Xgrid, Ygrid]).T
    NZeroInd = numpy.array(A_s[Xgrid, Ygrid].reshape((BSizX,BSizY)).nonzero()).T + numpy.array([[BIndX],[BIndY]]).T
    ZeroInd = numpy.array([x for x in AllInd.tolist() if x not in NZeroInd.tolist()]).T
    #
    # Replace zero-values in current block
    D_s[ZeroInd[0], ZeroInd[1]] = map ( lambda x, y : A_s[x,:].dot(B[:,y])[0], ZeroInd[0], ZeroInd[1] )

Upvotes: 0

hpaulj
hpaulj

Reputation: 231335

A possibly cleaner and faster version of your naive code:

In [57]: r,c=A.nonzero()    # this uses A.tocoo()

In [58]: C=A*B
In [59]: Cl=C.tolil()
In [60]: Cl[r,c]=A.tolil()[r,c]
In [61]: Cl.tocsr()

C[r,c]=A[r,c] gives an efficiency warning, but I think that's aimed more a people do that kind of assignment in loop.

In [63]: %%timeit C=A*B
    ...: C[r,c]=A[r,c]
...
The slowest run took 7.32 times longer than the fastest....
1000 loops, best of 3: 334 µs per loop

In [64]: %%timeit C=A*B
    ...: Cl=C.tolil()
    ...: Cl[r,c]=A.tolil()[r,c]
    ...: Cl.tocsr()
    ...: 
100 loops, best of 3: 2.83 ms per loop

My A is small, only (250,100), but it looks like the round trip to lil isn't a time saver, despite the warning.

Masking with A==0 is bound to give problems when A is sparse

In [66]: Az=A==0
....SparseEfficiencyWarning...
In [67]: r1,c1=Az.nonzero()

Compared to the nonzero r for A, this r1 is much larger - the row index of all zeros in the sparse matrix; everything but the 25 nonzeros.

In [70]: r.shape
Out[70]: (25,)

In [71]: r1.shape
Out[71]: (24975,)

If I index A with that r1 I get a much larger array. In effect I am repeating each row by the number of zeros in it

In [72]: A[r1,:]
Out[72]: 
<24975x100 sparse matrix of type '<class 'numpy.float64'>'
    with 2473 stored elements in Compressed Sparse Row format>

In [73]: A
Out[73]: 
<250x100 sparse matrix of type '<class 'numpy.float64'>'
    with 25 stored elements in Compressed Sparse Row format>

I've increased the shape and number of nonzero elements by roughly 100 (the number of columns).

Defining foo, and copying Divakar's tests:

def foo(A,B):
    r,c = A.nonzero()
    C = A*B
    C[r,c] = A[r,c]
    return C

In [83]: timeit naive(A,B)
100 loops, best of 3: 2.53 ms per loop

In [84]: timeit proposed(A,B)
/...
  SparseEfficiencyWarning)
100 loops, best of 3: 4.48 ms per loop

In [85]: timeit foo(A,B)
...
  SparseEfficiencyWarning)
100 loops, best of 3: 2.13 ms per loop

So my version has a modest speed inprovement. As Divakar found out, changing sparsity changes the relative advantages. I expect size to also change them.

The fact that A.nonzero uses the coo format, suggests it might be feasible to construct the new array with that format. A lot of sparse code builds a new matrix via the coo values.

In [97]: Co=C.tocoo()    
In [98]: Ao=A.tocoo()

In [99]: r=np.concatenate((Co.row,Ao.row))
In [100]: c=np.concatenate((Co.col,Ao.col))
In [101]: d=np.concatenate((Co.data,Ao.data))

In [102]: r.shape
Out[102]: (79,)

In [103]: C1=sparse.csr_matrix((d,(r,c)),shape=A.shape)

In [104]: C1
Out[104]: 
<250x100 sparse matrix of type '<class 'numpy.float64'>'
    with 78 stored elements in Compressed Sparse Row format>

This C1 has, I think, the same non-zero elements as the C constructed by other means. But I think one value is different because the r is longer. In this particular example, C and A share one nonzero element, and the coo style of input sums those, where as we'd prefer to have A values overwrite everything.

If you can tolerate this discrepancy, this is a faster way (at least for this test case):

def bar(A,B):
    C=A*B
    Co=C.tocoo()
    Ao=A.tocoo()
    r=np.concatenate((Co.row,Ao.row))
    c=np.concatenate((Co.col,Ao.col))
    d=np.concatenate((Co.data,Ao.data))
    return sparse.csr_matrix((d,(r,c)),shape=A.shape)

In [107]: timeit bar(A,B)
1000 loops, best of 3: 1.03 ms per loop

Upvotes: 3

Divakar
Divakar

Reputation: 221514

Cracked it! Well, there's a lot of scipy stuffs specific to sparse matrices that I learnt along the way. Here's the implementation that I could muster -

# Find the indices in output array that are to be updated  
R,C = ((A!=0).dot(B!=0)).nonzero()
mask = np.asarray(A[R,C]==0).ravel()
R,C = R[mask],C[mask]

# Make a copy of A and get the dot product through sliced rows and columns
# off A and B using the definition of matrix-multiplication    
out = A.copy()
out[R,C] = (A[R].multiply(B[:,C].T).sum(1)).ravel()   

The most expensive part seems to be element-wise multiplication and summing. On some quick timing tests, it seems that this would be good on a sparse matrices with a high degree of sparsity to beat the original dot-mask-based solution in terms of performance, which I think comes from its focus on memory efficiency.

Runtime test

Function definitions -

def naive(A, B):
    mask = A != 0
    out = A.dot(B).tolil()
    out[mask] = A[mask]
    return out.tocsr()

def proposed(A, B):
    R,C = ((A!=0).dot(B!=0)).nonzero()
    mask = np.asarray(A[R,C]==0).ravel()
    R,C = R[mask],C[mask]
    out = A.copy()
    out[R,C] = (A[R].multiply(B[:,C].T).sum(1)).ravel()    
    return out

Timings -

In [57]: # Input matrices 
    ...: M,N = 25000, 170       
    ...: A = sparse.rand(M, N, density=0.001, format='csr')
    ...: B = sparse.rand(N, N, density=0.02, format='csr')
    ...: 

In [58]: %timeit naive(A, B)
10 loops, best of 3: 92.2 ms per loop

In [59]: %timeit proposed(A, B)
10 loops, best of 3: 132 ms per loop

In [60]: # Input matrices with increased sparse-ness
    ...: M,N = 25000, 170       
    ...: A = sparse.rand(M, N, density=0.0001, format='csr')
    ...: B = sparse.rand(N, N, density=0.002, format='csr')
    ...: 

In [61]: %timeit naive(A, B)
10 loops, best of 3: 78.1 ms per loop

In [62]: %timeit proposed(A, B)
100 loops, best of 3: 8.03 ms per loop

Upvotes: 3

Related Questions