Reputation: 8068
I'm trying to extract sequences from an array b for which a boolean array a is used as index (len(a) >= len(b)
, but (a==True).sum() == len(b)
, i.e. there are only as many values true in a than there are elements in b). The sequences should be represented in the result as start and end index of a where a[i]
is true and for which there are consecutive values.
For instance, for the following arrays of a and b
a = np.asarray([True, True, False, False, False, True, True, True, False])
b = [1, 2, 3, 4, 5]
the result should be [((0, 1), [1, 2]), ((5, 7), [3, 4, 5])]
, so as many elements in the array as there are true-sequences. Each true sequence should contain the start and end index from a and the values these relate to from b).
So for the above:
[
((0, 1), [1, 2]), # first true sequence: starting at index=0 (in a), ending at index=1, mapping to the values [1, 2] in b
((5, 7), [3, 4, 5]) # second true sequence: starting at index=5, ending at index=7, with values in b=[3, 4, 5]
]
How can this be done efficiently in numpy?
Upvotes: 3
Views: 1182
Reputation: 221514
Here's one NumPy based one inspired by this post
-
def func1(a,b):
# "Enclose" mask with sentients to catch shifts later on
mask = np.r_[False,a,False]
# Get the shifting indices
idx = np.flatnonzero(mask[1:] != mask[:-1])
s0,s1 = idx[::2], idx[1::2]
idx_b = np.r_[0,(s1-s0).cumsum()]
out = []
for (i,j,k,l) in zip(s0,s1-1,idx_b[:-1],idx_b[1:]):
out.append(((i, j), b[k:l]))
return out
Sample run -
In [104]: a
Out[104]: array([ True, True, False, False, False, True, True, True, False])
In [105]: b
Out[105]: [1, 2, 3, 4, 5]
In [106]: func1(a,b)
Out[106]: [((0, 1), [1, 2]), ((5, 7), [3, 4, 5])]
Timings -
In [156]: # Using given sample data and tiling it 1000x
...: a = np.asarray([True, True, False, False, False, True, True, True, False])
...: b = [1, 2, 3, 4, 5]
...: a = np.tile(a,1000)
...: b = np.tile(b,1000)
# @Chris's soln
In [157]: %%timeit
...: res = []
...: gen = (i for i in b)
...: for k, g in itertools.groupby(enumerate(a), lambda x:x[1]):
...: if k:
...: ind, bools = list(zip(*g))
...: res.append((ind[0::len(ind)-1], list(itertools.islice(gen, len(bools)))))
100 loops, best of 3: 13.8 ms per loop
In [158]: %timeit func1(a,b)
1000 loops, best of 3: 1.29 ms per loop
Upvotes: 3
Reputation: 107
Solution uses a method where() from numpy:
result = []
f = np.where(a)[0]
m = 1
for j in list(create(f)):
lo = j[1]-j[0]+1
result.append((j, [*range(m, m + lo)]))
m += lo
print(result)
#OUTPUT: [((0, 1), [1, 2]), ((5, 7), [3, 4, 5])]
And there is a method to split an array [0 1 5 6 7] -- > [(0, 1), (5, 7)]:
def create(k):
le = len(k)
i = 0
while i < le:
left = k[i]
while i < le - 1 and k[i] + 1 == k[i + 1]:
i += 1
right = k[i]
if right - left >= 1:
yield (left, right)
elif right - left == 1:
yield (left, )
yield (right, )
else:
yield (left, )
i += 1
Upvotes: 1
Reputation: 29742
Using itertools.groupby
and itertools.islice
:
import itertools
res = []
gen = (i for i in b)
for k, g in itertools.groupby(enumerate(a), lambda x:x[1]):
if k:
ind, bools = list(zip(*g))
res.append((ind[0::len(ind)-1], list(itertools.islice(gen, len(bools)))))
Output
[((0, 1), [1, 2]), ((5, 7), [3, 4, 5])]
Insights:
itertools.groupby
returns grouped object of True
s and False
s.list[0::len(list)-1]
returns first and last element of list
.b
always have a same number of True
s, make b
a generator
and grab as many elements as there are True
s.Time taken:
def itertool_version():
res = []
gen = (i for i in b)
for k, g in itertools.groupby(enumerate(a), lambda x:x[1]):
if k:
ind, bools = list(zip(*g))
res.append((ind[0::len(ind)-1], list(itertools.islice(gen, len(bools)))))
return res
%timeit itertool()
7.11 µs ± 313 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Upvotes: 2
Reputation: 1422
Here is my two cents. Hope it helps. [EDITED]
# Get Data
a = np.asarray([True, True, False, False, False, True, True, True, False])
b = [1, 2, 3, 4, 5]
# Assign Index names
ac = ac.astype(float)
ac[ac==1] = b
# Select edges
ac[(np.roll(ac, 1) != 0) & (np.roll(ac, -1) != 0)] = 0 # Clear out intermediates
indices = ac[ac != 0] # Select only edges
indices.reshape(2, int(indices.shape[0]/2)) # group in pairs
Output
>> [[1, 2], [3, 5]]
Upvotes: 1
Reputation: 160
I don't know about a solution using numpy, but maybe the following for-loop solution will help you (or others) finding a different, more efficient solution:
import numpy as np
a = np.asarray([True, True, False, False, False, True, True, True, False])
b = []
temp_list = []
count = 0
for val in a:
if (val):
count += 1
temp_list.append(count) if len(temp_list) == 0 else None # Only add the first 'True' value in a sequence
# Code only reached if val is not true > append b if temp_list has more than 1 entry
elif (len(temp_list) > 0):
temp_list.append(count) # Add the last true value in a sequence
b.append(temp_list)
temp_list = []
print(b)
>>> [[1, 2], [3, 5]]
Upvotes: 1