Reputation: 781
I have an enormous 1D numpy array of booleans w
and an increasing list of indices i
, which splits w
into len(i)+1
subarrays. A toy example is:
w=numpy.array([True,False,False,False,True,True,True,True,False,False])
i=numpy.array([0,0,2,5,5,8,8])
I wish to compute a numpy array wi
, whose i
-th entry is 1 if the i
-th subarray contains a True
and 0 otherwise. In other words, the i
-th entry of w
is the sum (logical 'or') of elements of the i
-th subarray of w
. In our example, the output is:
[0 0 1 1 0 1 0 0]
This is achieved with the code:
wi=numpy.fromiter(map(numpy.any,numpy.split(w,i)),int)
Is there a more efficient way of doing this or is this optimal as far as memory is concerned?
P.S. related post
Upvotes: 1
Views: 156
Reputation: 221524
For efficiency (memory and performance), use np.bitwise_or.reduceat
as it keeps the output in boolean -
In [10]: np.bitwise_or.reduceat(w,np.r_[0,i])
Out[10]: array([ True, True, False, True, False, False])
To have as int output, view as int
-
In [11]: np.bitwise_or.reduceat(w,np.r_[0,i]).view('i1')
Out[11]: array([1, 1, 0, 1, 0, 0], dtype=int8)
Here's all-weather solution -
def slice_reduce_or(w, i):
valid = i<len(w)
invalidc =( ~valid).sum()
i = i[valid]
mi = np.r_[i[:-1]!=i[1:],True]
pp = i[mi]
p1 = np.bitwise_or.reduceat(w,pp)
N = len(i)+1
out = np.zeros(N+invalidc, dtype=bool)
out[1:N][mi] = p1
out[0] = w[:i[0]].any()
return out.view('i1')
Upvotes: 2
Reputation: 150735
Let's try np.add.reductat
:
wi = np.add.reduceat(w,np.r_[0,i]).astype(bool)
output:
array([1, 1, 0, 1, 0, 0])
And performance:
%timeit -n 100 wi = np.add.reduceat(w,np.r_[0,i]).astype(bool).astype(int)
21.7 µs ± 7.86 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit -n 100 wi=np.fromiter(map(np.any,np.split(w,i)),int)
44.5 µs ± 7.79 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
So we're looking at about 2x speed here.
Upvotes: 2