Reputation: 17241
Say I have a range r=numpy.array(range(1, 6))
and I am calculating the cumulative sum using numpy.cumsum(r)
. But instead of returning [1, 3, 6, 10, 15]
I would like it to return [1, 3, 6]
because of the condition that the cumulative result must be less than 10.
If the array is very large, I would like the cumulative sum to break out before it starts calculating values that are redundant and will be thrown away later. Of course, I am trivializing everything here for the sake of the question.
Is it possible to break out of cumsum
or cumprod
early based on a condition?
Upvotes: 1
Views: 197
Reputation: 40753
Depending on size of the array you are computing the cumulative sum for and how quickly you expect the target value to be reached it may be quicker to calculate the cumulative sum in steps.
import numpy as np
size = 1000000
target = size
def stepped_cumsum():
arr = np.arange(size)
out = np.empty(len(arr), dtype=int)
step = 1000
last_value = 0
for i in range(0, len(arr), step):
np.cumsum(arr[i:i+step], out=out[i:i+step])
out[i:i+step] += last_value
last_value = out[i+step-1]
if last_value >= target:
break
else:
return out
greater_than_target_index = i + (out[i:i+step] >= target).argmax()
# .copy() required so rest of backing array can be freed
return out[:greater_than_target_index].copy()
def normal_cumsum():
arr = np.arange(size)
out = np.cumsum(arr)
return out
stepped_result = stepped_cumsum()
normal_result = normal_cumsum()
assert (stepped_result < target).all()
assert (stepped_result == normal_result[:len(stepped_result)]).all()
Results:
In [60]: %timeit cumsum.stepped_cumsum()
1000 loops, best of 3: 1.22 ms per loop
In [61]: %timeit cumsum.normal_cumsum()
100 loops, best of 3: 3.69 ms per loop
Upvotes: 0
Reputation: 18488
I don't think this is possible with any function in numpy, since in most cases these are meant for vectorized computations on fixed-length arrays. One obvious way to do what you want is to break out of a standard for-loop in Python (as I assume you know):
def limited_cumsum(x, limit):
y = []
sm = 0
for item in x:
sm += item
if sm > limit:
return y
y.append(sm)
return y
But this would obviously be an order of magnitude slower than numpy's cumsum.
Since you probably need some very specialized function, the changes are low to find the exact function you need in numpy. You should probably have a look at Cython, which allows you to implement custom functions that are as flexible as a Python function (and using a syntax that is almost Python), with a speed close to that of C.
Upvotes: 2