Reputation: 61405
I have the following arrays:
# input
In [77]: arr = np.array([23, 45, 23, 0, 12, 45, 45])
# result
In [78]: res = np.zeros_like(arr)
Now, I want to compute a moving sum of unique elements and store it in the res
array.
Concretely, res
array should be:
In [79]: res
Out[79]: array([1, 1, 2, 1, 1, 2, 3])
[23, 45, 23, 0, 12, 45, 45]
[1, 1, 2, 1, 1, 2, 3]
We start counting each element and increment the count if an element re-appears, until we reach end of the array. This element specific counts should be returned as result.
How should we do achieve this using NumPy built-in functions? I tried using numpy.bincount
but it gives undesired results.
Upvotes: 1
Views: 226
Reputation: 53069
Not sure you'll find a builtin, so here is a homebrew using argsort.
def running_count(arr):
idx = arr.argsort(kind='mergesort')
sarr = arr[idx]
neq = np.where(sarr[1:] != sarr[:-1])[0] + 1
run = np.ones(arr.shape, int)
run[neq[0]] -= neq[0]
run[neq[1:]] -= np.diff(neq)
res = np.empty_like(run)
res[idx] = run.cumsum()
return res
For example:
>>> running_count(arr)
array([1, 1, 2, 1, 1, 2, 3])
>>> running_count(np.array(list("xabaaybeeetz")))
array([1, 1, 1, 2, 3, 1, 2, 1, 2, 3, 1, 1])
Explainer:
We first sort using argsort because we need indices to go back to original order in the end. Here it is important to have a stable sort, hence the use of the slow mergesort.
Once the elements are sorted the running counts will form a "saw tooth" pattern. The vectorized way to create this is to observe that the diff of a saw tooth has "jump" values where a new tooth starts and ones everywhere else. So that is staight-forward to construct.
Upvotes: 3