Reputation: 3
I am trying to build a function that counts up zeros in a list until nonzero entry appears, and start counting from 0 again. For example,
>>> a
array([[ 0, 0, 1, 0, 2],
[ 0, 0, 0, 1, 1],
[ 0, 1, 0, 0, 0],
[ 0, 0, 10, 2, 2],
[ 2, 0, 0, 0, 0]])
In this case, my desired output will be
array([[1, 1, 0, 1, 0],
[2, 2, 1, 0, 0],
[3, 0, 2, 1, 1],
[4, 1, 0, 0, 0],
[0, 2, 1, 1, 1]])
I tried making this with two for-loops, but it is very slow on a very large data set. I was hoping I could find a way to vectorize this operation so that the time is O(n) instead of O(n^2). Any help will be appreciated!
Upvotes: 0
Views: 527
Reputation: 700
What about something like this, which is vectorized and has no for
loops:
def moving_count(a, value, axis=0):
"""Return sequential counts of a given value along an axis"""
if np.all(a == value):
# Fill the output with counts along the desired axis
return np.rollaxis(np.arange(1, a.size + 1).reshape(a.shape), axis)
# Allocate output with a cumulative count of value along flattened axis
output = np.cumsum(np.rollaxis(a, axis) == value)
# Find locations of breakpoints
breakpoints = (np.roll(output, 1) - output) == 0
# Since breakpoints is boolean, argmax returns the location of the first breakpoint
threshold = np.argmax(breakpoints)
# Repeat the cumulative value along labels and subtract to reset the count at each breakpoint
output[threshold:] -= output[breakpoints][np.cumsum(breakpoints) - 1][threshold:]
# Reshape and return axis to match input array
return np.rollaxis(output.reshape(a.shape), axis)
Applied to your problem:
In[3]: a = np.array([[ 0, 0, 1, 0, 2],
[ 0, 0, 0, 1, 1],
[ 0, 1, 0, 0, 0],
[ 0, 0, 10, 2, 2],
[ 2, 0, 0, 0, 0]])
In[4]: moving_count(a, 0, 1)
Out[4]:
array([[1, 1, 0, 2, 0],
[2, 2, 1, 0, 0],
[3, 0, 2, 1, 1],
[4, 1, 0, 0, 0],
[0, 2, 1, 1, 1]])
Upvotes: 0
Reputation: 1866
Something like this might make it a little faster:
a = np.array([[ 0, 0, 1, 0, 2],
[ 0, 0, 0, 1, 1],
[ 0, 1, 0, 0, 0],
[ 0, 0, 10, 2, 2],
[ 2, 0, 0, 0, 0]])
b = (a == 0)
c = np.zeros_like(a)
c[0, :] += b[0, :]
for i in range(1, c.shape[1]):
c[i, :] = b[i, :] * (1 + c[i-1, :])
Array 'c' gives the desired result.
Or optimizing a little further...
a = ...
b = (a == 0) * 1
for i in range(1, b.shape[1]):
b[i, :] *= (1 + b[i-1, :])
Now 'b' is your result, and you've got one array less to deal with.
You'll notice that this algorithm still has the same time complexity as the 'two for loop' solution, but now one of those loops is being internalized by numpy, so I would expect a speedup on a large array.
Upvotes: 1