anyhoon
anyhoon

Reputation: 3

Vectorized Counting in 2D array python

I am trying to build a function that counts up zeros in a list until nonzero entry appears, and start counting from 0 again. For example,

>>> a
array([[ 0,  0,  1,  0,  2],
       [ 0,  0,  0,  1,  1],
       [ 0,  1,  0,  0,  0],
       [ 0,  0, 10,  2,  2],
       [ 2,  0,  0,  0,  0]])

In this case, my desired output will be

array([[1, 1, 0, 1, 0],
       [2, 2, 1, 0, 0],
       [3, 0, 2, 1, 1],
       [4, 1, 0, 0, 0],
       [0, 2, 1, 1, 1]])

I tried making this with two for-loops, but it is very slow on a very large data set. I was hoping I could find a way to vectorize this operation so that the time is O(n) instead of O(n^2). Any help will be appreciated!

Upvotes: 0

Views: 527

Answers (2)

Devin Cairns
Devin Cairns

Reputation: 700

What about something like this, which is vectorized and has no for loops:

def moving_count(a, value, axis=0):
    """Return sequential counts of a given value along an axis"""
    if np.all(a == value):
        # Fill the output with counts along the desired axis
        return np.rollaxis(np.arange(1, a.size + 1).reshape(a.shape), axis)

    # Allocate output with a cumulative count of value along flattened axis
    output = np.cumsum(np.rollaxis(a, axis) == value)

    # Find locations of breakpoints
    breakpoints = (np.roll(output, 1) - output) == 0

    # Since breakpoints is boolean, argmax returns the location of the first breakpoint
    threshold = np.argmax(breakpoints)

    # Repeat the cumulative value along labels and subtract to reset the count at each breakpoint
    output[threshold:] -= output[breakpoints][np.cumsum(breakpoints) - 1][threshold:]

    # Reshape and return axis to match input array
    return np.rollaxis(output.reshape(a.shape), axis)

Applied to your problem:

In[3]: a = np.array([[ 0,  0,  1,  0,  2],
                     [ 0,  0,  0,  1,  1],
                     [ 0,  1,  0,  0,  0],
                     [ 0,  0, 10,  2,  2],
                     [ 2,  0,  0,  0,  0]])
In[4]: moving_count(a, 0, 1)
Out[4]: 

array([[1, 1, 0, 2, 0],
       [2, 2, 1, 0, 0],
       [3, 0, 2, 1, 1],
       [4, 1, 0, 0, 0],
       [0, 2, 1, 1, 1]])

Upvotes: 0

tvt173
tvt173

Reputation: 1866

Something like this might make it a little faster:

a = np.array([[ 0,  0,  1,  0,  2],
              [ 0,  0,  0,  1,  1],
              [ 0,  1,  0,  0,  0],
              [ 0,  0, 10,  2,  2],
              [ 2,  0,  0,  0,  0]])

b = (a == 0)
c = np.zeros_like(a)
c[0, :] += b[0, :]
for i in range(1, c.shape[1]):
    c[i, :] = b[i, :] * (1 + c[i-1, :])

Array 'c' gives the desired result.

Or optimizing a little further...

a = ...
b = (a == 0) * 1
for i in range(1, b.shape[1]):
    b[i, :] *= (1 + b[i-1, :])

Now 'b' is your result, and you've got one array less to deal with.

You'll notice that this algorithm still has the same time complexity as the 'two for loop' solution, but now one of those loops is being internalized by numpy, so I would expect a speedup on a large array.

Upvotes: 1

Related Questions