Reputation: 85
Foreword: I'll probably update the title as I suspect I only lack the correct wording for my problem. (Which makes it a potential duplicate, sorry about that). I'd appreciate suggestions to do so.
Here's functionally what I want to accomplish:
import numpy as np
a = np.array([1, 0, 0, 0, 1, 0])
b = np.array([0, 0, 1, 0, 0, 0])
# What I want:
result = [1, 1, 0, 0, 1, 1]
# [1, 0] -> 1
# [0, 1] -> 0
# [0, 0] -> the last value
To try and be clearer, here's a naive implementation (Goal being obviously to do that with numpy methods):
def sticky_cumsum(a, b):
result = []
for val_a, val_b in zip(a, b):
if val_a:
results.append(1)
elif val_b: # assuming both can't be true
results.append(0)
else: # assuming first iteration won't be [0, 0]
results.append(results[-1])
return results
Edit 1: As pointed out in the comments, [0, 0] in first position has an undefined behaviour. That was a simplified case of my actual context (which would be irrelevant). It's assumed that [0, 0] will never happen in first position.
Upvotes: 2
Views: 118
Reputation: 1811
If you are able to use Cython, you could use something like the following, which can be run from a Jupyter Notebook using the following
%load_ext Cython
Then in a seperate cell, run.
%%cython -a
from cython cimport boundscheck, wraparound
cimport numpy as np
import numpy as np
@boundscheck(False)
@wraparound(False)
def cython_sticky_cumsum(int[::1] a, int[::1] b):
cdef size_t i, N = a.size
cdef np.ndarray[np.int32_t, ndim=1] result = np.empty(N, dtype=np.int32)
for i in range(N):
if a[i] == 1:
result[i] = 1
elif b[i] == :
result[i] = 0
else:
result[i] = result[i-1]
return result
If you're concerned with performance/using large arrays, something like the above may be preferable. I guess this depends on what you find more readable.
a = np.array([1, 0, 0, 0, 1, 0])
b = np.array([0, 0, 1, 0, 0, 0])
cython_sticky_cumsum(a, b)
# array([1, 1, 0, 0, 1, 1])
For much larger arrays, such as,
a = np.tile(np.array([1, 0, 0, 0, 1, 0]), 1000000)
b = np.tile(np.array([0, 0, 1, 0, 0, 0]), 1000000)
and running the tests,
%timeit cython_sticky_cumsum(a,b)
%timeit sticky_cumsum(a, b)
outputs
28.4 ms ± 1.86 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
2.5 s ± 97.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Upvotes: 0
Reputation: 150785
You can use np.select
to mask (1,0)
and (0,1)
. Then use this answer to fill in the nan
with previous values:
arr = np.select((a==1, b==1), (1,0), np.nan)
# inspired by the linked question
mask = np.isnan(arr)
idx = np.where(~mask,np.arange(len(mask)), 0)
np.maximum.accumulate(idx,out=idx)
out = arr[idx]
Output:
array([1., 1., 0., 0., 1., 1.])
Upvotes: 4