kmario23
kmario23

Reputation: 61405

Computing a "moving sum of counts" on a NumPy array

I have the following arrays:

# input
In [77]: arr = np.array([23, 45, 23, 0, 12, 45, 45])

# result
In [78]: res = np.zeros_like(arr)

Now, I want to compute a moving sum of unique elements and store it in the res array.

Concretely, res array should be:

In [79]: res
Out[79]: array([1, 1, 2, 1, 1, 2, 3])

[23, 45, 23, 0, 12, 45, 45]
[1,   1,   2,   1,   1,   2,   3]

We start counting each element and increment the count if an element re-appears, until we reach end of the array. This element specific counts should be returned as result.


How should we do achieve this using NumPy built-in functions? I tried using numpy.bincount but it gives undesired results.

Upvotes: 1

Views: 226

Answers (1)

Paul Panzer
Paul Panzer

Reputation: 53069

Not sure you'll find a builtin, so here is a homebrew using argsort.

def running_count(arr):
    idx = arr.argsort(kind='mergesort')
    sarr = arr[idx]
    neq = np.where(sarr[1:] != sarr[:-1])[0] + 1
    run = np.ones(arr.shape, int)
    run[neq[0]] -= neq[0]
    run[neq[1:]] -= np.diff(neq)
    res = np.empty_like(run)
    res[idx] = run.cumsum()
    return res

For example:

>>> running_count(arr)
array([1, 1, 2, 1, 1, 2, 3])
>>> running_count(np.array(list("xabaaybeeetz")))
array([1, 1, 1, 2, 3, 1, 2, 1, 2, 3, 1, 1])

Explainer:

We first sort using argsort because we need indices to go back to original order in the end. Here it is important to have a stable sort, hence the use of the slow mergesort.

Once the elements are sorted the running counts will form a "saw tooth" pattern. The vectorized way to create this is to observe that the diff of a saw tooth has "jump" values where a new tooth starts and ones everywhere else. So that is staight-forward to construct.

Upvotes: 3

Related Questions