user11376013
user11376013

Reputation:

How to vectorize computation on arrays of different dimensions?

I have some large numpy arrays of complex numbers I need to perform computations on.

import numpy as np

# Reduced sizes -- real ones are orders of magnitude larger
n, d, l = 50000, 3, 1000

# Two complex matrices and a column vector
x = np.random.rand(n, d) + 1j*np.random.rand(n, d)
s = np.random.rand(n, d) + 1j*np.random.rand(n, d)
v = np.random.rand(l)[:, np.newaxis]

The function is basically x*v*s for each row of x (and s) and then that product is summed across the row. Because the arrays are different sizes, I can't figure out a way to vectorize the computation and it's way too slow to use a for-loop.

My current implementation is this (~3.5 seconds):

h = []
for i in range(len(x)):
    h.append(np.sum(x[i,:]*v*s[i,:], axis=1))

h = np.asarray(h)

I also tried using np.apply_along_axis() with an augmented matrix but it's only slightly faster (~2.6s) and not that readable.

def func(m, v):
    return np.sum(m[:d]*v*m[d:], axis=1)

h = np.apply_along_axis(func, 1, np.hstack([x, s]), v)

What's a much quicker way to compute this result? I can leverage other packages such as dask if that helps.

Upvotes: 0

Views: 96

Answers (1)

hpaulj
hpaulj

Reputation: 231325

With broadcasting this should work:

np.sum(((x*s)[...,None]*v[:,0], axis=1)

but with your sample dimensions I'm getting a memory error. The 'outer' broadcasted array (n,d,l) shape is too large for my memory.

I can reduce memory usage by iterating on the smaller d dimension:

res = np.zeros((n,l), dtype=x.dtype) 
for i in range(d): 
    res += (x[:,i]*s[:,i])[:,None]*v[:,0] 

This tests the same as your h, but I wasn't able to complete time tests. Generally iterating on the smaller dimension is faster.

I may repeat things with small dimensions.

This probably can also be expressed as an einsum problem, though it may not help with these dimensions.


In [1]: n, d, l = 5000, 3, 1000 
   ...:  
   ...: # Two complex matrices and a column vector 
   ...: x = np.random.rand(n, d) + 1j*np.random.rand(n, d) 
   ...: s = np.random.rand(n, d) + 1j*np.random.rand(n, d) 
   ...: v = np.random.rand(l)[:, np.newaxis]                                                           
In [2]:                                                                                                
In [2]: h = [] 
   ...: for i in range(len(x)): 
   ...:     h.append(np.sum(x[i,:]*v*s[i,:], axis=1)) 
   ...:  
   ...: h = np.asarray(h)                                                                              
In [3]: h.shape                                                                                        
Out[3]: (5000, 1000)
In [4]: res = np.zeros((n,l), dtype=x.dtype) 
   ...: for i in range(d): 
   ...:     res += (x[:,i]*s[:,i])[:,None]*v[:,0] 
   ...:                                                                                                
In [5]: res.shape                                                                                      
Out[5]: (5000, 1000)
In [6]: np.allclose(res,h)                                                                             
Out[6]: True
In [7]: %%timeit  
   ...: h = [] 
   ...: for i in range(len(x)): 
   ...:     h.append(np.sum(x[i,:]*v*s[i,:], axis=1)) 
   ...: h = np.asarray(h) 
   ...:  
   ...:                                                                                                
490 ms ± 3.36 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [8]: %%timeit 
   ...: res = np.zeros((n,l), dtype=x.dtype) 
   ...: for i in range(d): 
   ...:     res += (x[:,i]*s[:,i])[:,None]*v[:,0] 
   ...:                                                                                                

354 ms ± 1.44 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [9]:                                                                                                
In [9]: np.sum((x*s)[...,None]*v[:,0], axis=1).shape                                                   
Out[9]: (5000, 1000)
In [10]: out = np.sum((x*s)[...,None]*v[:,0], axis=1)                                                  
In [11]: np.allclose(h,out)                                                                            
Out[11]: True
In [12]: timeit out = np.sum((x*s)[...,None]*v[:,0], axis=1)                                           
310 ms ± 964 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

Some time savings, but not big.

And the einsum version:

In [13]: np.einsum('ij,ij,k->ik',x,s,v[:,0]).shape                                                     
Out[13]: (5000, 1000)
In [14]: np.allclose(np.einsum('ij,ij,k->ik',x,s,v[:,0]),h)                                            
Out[14]: True
In [15]: timeit np.einsum('ij,ij,k->ik',x,s,v[:,0]).shape                                              
167 ms ± 137 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Good time savings. But I don't know how it will scale.


But the einsum made me realize that we can sum on d dimension earlier, before multiplying by v - and gain a lot in time and memory usage:

In [16]: np.allclose(np.sum(x*s, axis=1)[:,None]*v[:,0],h)                                             
Out[16]: True
In [17]: timeit np.sum(x*s, axis=1)[:,None]*v[:,0]                                                     
68.4 ms ± 1.58 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

@cs95 got there first!

As per @PaulPanzer's comment, the optimize flag helps. It's probably making the same deduction - that we can sum on j early:

In [18]: timeit np.einsum('ij,ij,k->ik',x,s,v[:,0],optimize=True).shape                                
91.6 ms ± 991 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Upvotes: 2

Related Questions