Reputation: 77
I am currently computing a function that contains a summation over an index. The index is between 0 and the integer part of T; ideally I would like to be able to compute this summation quickly for several values of T. In a real-life case, most of the values of T are small, but a small percentage can be one or two orders of magnitude larger than the average.
What I am doing now is:
1) I define the vector T, e.g. (my real-life data have a much larger number of entries, it is just to give an idea):
import numpy as np
T = np.random.exponential(5, 10)
2) I create a matrix containing the factors between 0 and int(T), and then zeroes:
n = int(T.max())
j = ((np.arange(n) < T[:,np.newaxis])*np.arange(1,n+1)).astype(int).transpose()
print(j)
[[ 1 1 1 1 1 1 1 1 1 1]
[ 2 0 2 2 2 0 2 0 2 2]
[ 0 0 3 0 3 0 3 0 3 3]
[ 0 0 4 0 4 0 0 0 4 4]
[ 0 0 5 0 5 0 0 0 5 5]
[ 0 0 6 0 6 0 0 0 6 6]
[ 0 0 7 0 7 0 0 0 0 7]
[ 0 0 8 0 8 0 0 0 0 8]
[ 0 0 9 0 9 0 0 0 0 9]
[ 0 0 0 0 10 0 0 0 0 10]
[ 0 0 0 0 11 0 0 0 0 0]
[ 0 0 0 0 12 0 0 0 0 0]]
3) I generate the single elements of the summation, using a mask to avoid applying the function to the elements that are zero:
A = np.log(1 + (1 + j) * 5)* (j>0)
4) I sum along the columns:
A.sum(axis=0)
Obtaining: array([ 5.170484 , 2.39789527, 29.96464821, 5.170484 , 42.29052851, 2.39789527, 8.21500643, 2.39789527, 18.49060911, 33.9899999 ])
Is there a fastest/better way to vectorize that? I have the feeling that it is very slow due to the large amount of zeroes that do not contribute to the sum, but since I am a beginner with NumPy I couldn't figure out a better way of writing it.
EDIT: in my actual problem, the function applied to j depends also on a second parameter tau (in a vector of the same size of T). So the items contained in every column are not the same.
Upvotes: 2
Views: 780
Reputation: 221744
Looking at your j
, for each column it has numbers going from 1
to N
, where N
is being decided based on each T
element. Then, you are summing along each column, which is the same as summing until N
because rest of the elements are zeros anyway. Those summed values could be calculated with np.cumsum
and those N
values that are basically the limits of each column in j
could be directly calculated from T
. These N
values are then used as indices to index into the cumsum-ed
values to give us the final output.
This should be pretty fast and memory efficient, given that cumsum
is the only computation done and that too on a 1D array, as compared to the summation done in the original approach on a 2D array along each column. Thus, we have a vectorized approach like so -
n = int(T.max())
vals = (np.log(1 + (1 + np.arange(1,n+1)) * 5)).cumsum()
out = vals[(T.astype(int)).clip(max=n-1)]
In terms of memory usage, we are generating three variables -
n : Scalar
vals : 1D array of n elements
out : 1D array of T.size elements (this is the output anyway)
Runtime test and verify output -
In [5]: def original_app(T):
...: n = int(T.max())
...: j = ((np.arange(n) < T[:,None])*np.arange(1,n+1)).astype(int).transpose()
...: A = np.log(1 + (1 + j) * 5)* (j>0)
...: return A.sum(axis=0)
...:
...: def vectorized_app(T):
...: n = int(T.max())
...: vals = (np.log(1 + (1 + np.arange(1,n+1)) * 5)).cumsum()
...: return vals[(T.astype(int)).clip(max=n-1)]
...:
In [6]: # Input array
...: T = np.random.exponential(5, 10000)
In [7]: %timeit original_app(T)
100 loops, best of 3: 9.62 ms per loop
In [8]: %timeit vectorized_app(T)
10000 loops, best of 3: 50.1 µs per loop
In [9]: np.allclose(original_app(T),vectorized_app(T)) # Verify outputs
Out[9]: True
Upvotes: 2