omer sahban
omer sahban

Reputation: 97

How to ignore zeros in matrix multiplications?

Let's say I have a 10000 x 10000 matrix W with random numbers, and two 10000 dim vectors U and V, U has random numbers in it, V is filled with zeros. With numpy or pytorch, computing U @ W and V @ W takes the same amount of time. My question is, is there a way to optimize matrix multiplication so that it skips or ignores zeros during calculation, so things like V @ W will be computed faster?

import numpy as np
W = np.random.rand(10000, 10000)

U = np.random.rand(10000)
V = np.zeros(10000)

y1 = U @ W
y2 = V @ W
# computing y2 should take less amount of time than y1 since it always returns zero vector.

Upvotes: 2

Views: 1163

Answers (2)

hpaulj
hpaulj

Reputation: 231385

In [274]: W = np.random.rand(10000, 10000) 
     ...:  
     ...: U = np.random.rand(10000) 
     ...: V = np.zeros(10000)                                                                            
In [275]: timeit U@W                                                                                     
125 ms ± 263 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [276]: timeit V@W                                                                                     
153 ms ± 18.2 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Now consider a case where 100 of the elements of V are nonzero (1s). A sparse implementation could be:

In [277]: Vdata=np.ones(100); Vind=np.arange(0,10000,100)                                                
In [278]: Vind.shape                                                                                     
Out[278]: (100,)
In [279]: timeit Vdata@W[Vind,:]                                                                         
4.99 ms ± 102 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

I was a little surprised at this time, thinking that the indexing of W could cancel out the multiplication times.

Let's change V to verify the results:

In [280]: V[Vind]=1                                                                                      
In [281]: np.allclose(V@W, Vdata@W[Vind,:])  

What if I have to find the nonzero elements first:

In [282]: np.allclose(np.where(V),Vind)                                                                  
Out[282]: True
In [283]: timeit idx=np.where(V); V[idx]@W[idx,:]                                                        
5.07 ms ± 77.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

The size of W, especially that 2nd dimension could be making a big factor in this speedup. At these sizes memory management can affect speed as much as the raw multiplications.

===

In this case, sparse does better than I anticipated (other tests suggested I need sparsity around 1% to get time advantages):

In [294]: from scipy import sparse                                                                       
In [295]: Vc=sparse.csr_matrix(V)                                                                        
In [296]: Vc.dot(W)                                                                                      
Out[296]: 
array([[46.01437545, 50.46422246, 44.80337192, ..., 55.57660691,
        45.54413903, 48.28613399]])
In [297]: V.dot(W)                                                                                       
Out[297]: 
array([46.01437545, 50.46422246, 44.80337192, ..., 55.57660691,
       45.54413903, 48.28613399])
In [298]: np.allclose(Vc.dot(W),V@W)                                                                     
Out[298]: True

In [299]: timeit Vc.dot(W)                                                                               
1.48 ms ± 84.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Even with the sparse creation:

In [300]: timeit Vm=sparse.csr_matrix(V); Vm.dot(W)                                                      
2.01 ms ± 7.89 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Upvotes: 0

Grr
Grr

Reputation: 16079

You can use scipy.sparse classes to improve your performance, but it entirely depends on the matrix. For example the performance gained from using V as a sparse matrix will be great. That gained by converting U to a sparse matrix will not be great or may in fact decrease performance (in the case U is in fact dense).

import numpy as np
import scipy.sparse as sps

W = np.random.rand(10000, 10000)
U = np.random.rand(10000)
V = np.zeros(10000)

%timeit U @ W
125 ms ± 1.45 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit V @ W
128 ms ± 6.4 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Vsp = sps.csr_matrix(V)
Usp = sps.csr_matrix(U)
Wsp = sps.csr_matrix(W)

%timeit Vsp.dot(Wsp)
1.34 ms ± 15.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 
%timeit Vsp @ Wsp
1.39 ms ± 37.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit Usp @ Wsp
2.37 s ± 84.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

As you can see there is a major improvement from using sparse methods for V @ W, but you in fact decrease performance for U @ W as none of the entries in U or W are zero.

Upvotes: 3

Related Questions