PiZed
PiZed

Reputation: 77

Numpy vectorized summation with variable number of factors

I am currently computing a function that contains a summation over an index. The index is between 0 and the integer part of T; ideally I would like to be able to compute this summation quickly for several values of T. In a real-life case, most of the values of T are small, but a small percentage can be one or two orders of magnitude larger than the average.

What I am doing now is:
1) I define the vector T, e.g. (my real-life data have a much larger number of entries, it is just to give an idea):

import numpy as np 
T = np.random.exponential(5, 10)

2) I create a matrix containing the factors between 0 and int(T), and then zeroes:

n = int(T.max())
j = ((np.arange(n) < T[:,np.newaxis])*np.arange(1,n+1)).astype(int).transpose()
print(j)

[[ 1 1 1 1 1 1 1 1 1 1]
[ 2 0 2 2 2 0 2 0 2 2]
[ 0 0 3 0 3 0 3 0 3 3]
[ 0 0 4 0 4 0 0 0 4 4]
[ 0 0 5 0 5 0 0 0 5 5]
[ 0 0 6 0 6 0 0 0 6 6]
[ 0 0 7 0 7 0 0 0 0 7]
[ 0 0 8 0 8 0 0 0 0 8]
[ 0 0 9 0 9 0 0 0 0 9]
[ 0 0 0 0 10 0 0 0 0 10]
[ 0 0 0 0 11 0 0 0 0 0]
[ 0 0 0 0 12 0 0 0 0 0]]

3) I generate the single elements of the summation, using a mask to avoid applying the function to the elements that are zero:

A =  np.log(1 + (1 + j) * 5)* (j>0) 

4) I sum along the columns:

A.sum(axis=0)

Obtaining: array([ 5.170484 , 2.39789527, 29.96464821, 5.170484 , 42.29052851, 2.39789527, 8.21500643, 2.39789527, 18.49060911, 33.9899999 ])

Is there a fastest/better way to vectorize that? I have the feeling that it is very slow due to the large amount of zeroes that do not contribute to the sum, but since I am a beginner with NumPy I couldn't figure out a better way of writing it.

EDIT: in my actual problem, the function applied to j depends also on a second parameter tau (in a vector of the same size of T). So the items contained in every column are not the same.

Upvotes: 2

Views: 780

Answers (1)

Divakar
Divakar

Reputation: 221744

Looking at your j, for each column it has numbers going from 1 to N, where N is being decided based on each T element. Then, you are summing along each column, which is the same as summing until N because rest of the elements are zeros anyway. Those summed values could be calculated with np.cumsum and those N values that are basically the limits of each column in j could be directly calculated from T. These N values are then used as indices to index into the cumsum-ed values to give us the final output.

This should be pretty fast and memory efficient, given that cumsum is the only computation done and that too on a 1D array, as compared to the summation done in the original approach on a 2D array along each column. Thus, we have a vectorized approach like so -

n = int(T.max())
vals = (np.log(1 + (1 + np.arange(1,n+1)) * 5)).cumsum()
out = vals[(T.astype(int)).clip(max=n-1)]

In terms of memory usage, we are generating three variables -

n    : Scalar
vals : 1D array of n elements 
out  : 1D array of T.size elements (this is the output anyway)

Runtime test and verify output -

In [5]: def original_app(T):
   ...:     n = int(T.max())
   ...:     j = ((np.arange(n) < T[:,None])*np.arange(1,n+1)).astype(int).transpose()
   ...:     A =  np.log(1 + (1 + j) * 5)* (j>0) 
   ...:     return A.sum(axis=0)
   ...: 
   ...: def vectorized_app(T):
   ...:     n = int(T.max())
   ...:     vals = (np.log(1 + (1 + np.arange(1,n+1)) * 5)).cumsum()
   ...:     return vals[(T.astype(int)).clip(max=n-1)]
   ...: 

In [6]: # Input array
   ...: T = np.random.exponential(5, 10000)

In [7]: %timeit original_app(T)
100 loops, best of 3: 9.62 ms per loop

In [8]: %timeit vectorized_app(T)
10000 loops, best of 3: 50.1 µs per loop

In [9]: np.allclose(original_app(T),vectorized_app(T)) # Verify outputs
Out[9]: True

Upvotes: 2

Related Questions