QuantumPotatoe
QuantumPotatoe

Reputation: 85

Sticky cumsum/cumprod numpy

Foreword: I'll probably update the title as I suspect I only lack the correct wording for my problem. (Which makes it a potential duplicate, sorry about that). I'd appreciate suggestions to do so.

Here's functionally what I want to accomplish:

import numpy as np

a = np.array([1, 0, 0, 0, 1, 0])
b = np.array([0, 0, 1, 0, 0, 0])

# What I want:
result = [1, 1, 0, 0, 1, 1]
# [1, 0] -> 1 
# [0, 1] -> 0
# [0, 0] -> the last value

To try and be clearer, here's a naive implementation (Goal being obviously to do that with numpy methods):

def sticky_cumsum(a, b):
    result = []
    for val_a, val_b in zip(a, b):
        if val_a:
            results.append(1)
        elif val_b: # assuming both can't be true
            results.append(0)
        else: # assuming first iteration won't be [0, 0]
            results.append(results[-1])
    return results

Edit 1: As pointed out in the comments, [0, 0] in first position has an undefined behaviour. That was a simplified case of my actual context (which would be irrelevant). It's assumed that [0, 0] will never happen in first position.

Upvotes: 2

Views: 118

Answers (2)

Josmoor98
Josmoor98

Reputation: 1811

An alternative using Cython

If you are able to use Cython, you could use something like the following, which can be run from a Jupyter Notebook using the following

%load_ext Cython

Then in a seperate cell, run.

%%cython -a

from cython cimport boundscheck, wraparound
cimport numpy as np
import numpy as np

@boundscheck(False)
@wraparound(False)
def cython_sticky_cumsum(int[::1] a, int[::1] b):

    cdef size_t i, N = a.size
    cdef np.ndarray[np.int32_t, ndim=1] result = np.empty(N, dtype=np.int32)

    for i in range(N):
        if a[i] == 1:
            result[i] = 1
        elif b[i] == :
            result[i] = 0
        else:
            result[i] = result[i-1]

    return result

If you're concerned with performance/using large arrays, something like the above may be preferable. I guess this depends on what you find more readable.

a = np.array([1, 0, 0, 0, 1, 0])
b = np.array([0, 0, 1, 0, 0, 0])

cython_sticky_cumsum(a, b)
# array([1, 1, 0, 0, 1, 1])

Crude test

For much larger arrays, such as,

a = np.tile(np.array([1, 0, 0, 0, 1, 0]), 1000000)
b = np.tile(np.array([0, 0, 1, 0, 0, 0]), 1000000)

and running the tests,

%timeit cython_sticky_cumsum(a,b)
%timeit sticky_cumsum(a, b)

outputs

28.4 ms ± 1.86 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
2.5 s ± 97.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Upvotes: 0

Quang Hoang
Quang Hoang

Reputation: 150785

You can use np.select to mask (1,0) and (0,1). Then use this answer to fill in the nan with previous values:

arr = np.select((a==1, b==1), (1,0), np.nan)

# inspired by the linked question
mask = np.isnan(arr)
idx = np.where(~mask,np.arange(len(mask)), 0)
np.maximum.accumulate(idx,out=idx)
out = arr[idx]

Output:

array([1., 1., 0., 0., 1., 1.])

Upvotes: 4

Related Questions