TommasoF
TommasoF

Reputation: 825

Efficient numpy subarrays extraction from a mask

I am searching a pythonic way to extract multiple subarrays from a given array using a mask as shown in the example:

a = np.array([10, 5, 3, 2, 1])
m = np.array([True, True, False, True, True])

The output will be a collection of array like the following, where only the contiguous "region" of True values (True values next to each other) of the mask m represent the indices generating a subarray.

L[0] = np.array([10, 5])
L[1] = np.array([2, 1])

Upvotes: 4

Views: 1892

Answers (3)

Divakar
Divakar

Reputation: 221574

Here's one approach -

def separate_regions(a, m):
    m0 = np.concatenate(( [False], m, [False] ))
    idx = np.flatnonzero(m0[1:] != m0[:-1])
    return [a[idx[i]:idx[i+1]] for i in range(0,len(idx),2)]

Sample run -

In [41]: a = np.array([10, 5, 3, 2, 1])
    ...: m = np.array([True, True, False, True, True])
    ...: 

In [42]: separate_regions(a, m)
Out[42]: [array([10,  5]), array([2, 1])]

Runtime test

Other approach(es) -

# @kazemakase's soln
def zip_split(a, m):
    d = np.diff(m)
    cuts = np.flatnonzero(d) + 1

    asplit = np.split(a, cuts)
    msplit = np.split(m, cuts)

    L = [aseg for aseg, mseg in zip(asplit, msplit) if np.all(mseg)]
    return L

Timings -

In [49]: a = np.random.randint(0,9,(100000))

In [50]: m = np.random.rand(100000)>0.2

# @kazemakase's's solution
In [51]: %timeit zip_split(a,m)
10 loops, best of 3: 114 ms per loop

# @Daniel Forsman's solution
In [52]: %timeit splitByBool(a,m)
10 loops, best of 3: 25.1 ms per loop

# Proposed in this post
In [53]: %timeit separate_regions(a, m)
100 loops, best of 3: 5.01 ms per loop

Increasing the average length of islands -

In [58]: a = np.random.randint(0,9,(100000))

In [59]: m = np.random.rand(100000)>0.1

In [60]: %timeit zip_split(a,m)
10 loops, best of 3: 64.3 ms per loop

In [61]: %timeit splitByBool(a,m)
100 loops, best of 3: 14 ms per loop

In [62]: %timeit separate_regions(a, m)
100 loops, best of 3: 2.85 ms per loop

Upvotes: 3

MB-F
MB-F

Reputation: 23637

Sounds like a natural application for np.split.

You first have to figure out where to cut the array, which is where the mask changes between True and False. Next discard all elements where the mask is False.

a = np.array([10, 5, 3, 2, 1])
m = np.array([True, True, False, True, True])

d = np.diff(m)
cuts = np.flatnonzero(d) + 1

asplit = np.split(a, cuts)
msplit = np.split(m, cuts)

L = [aseg for aseg, mseg in zip(asplit, msplit) if np.all(mseg)]

print(L[0])  # [10  5]
print(L[1])  # [2 1]

Upvotes: 1

Daniel F
Daniel F

Reputation: 14399

def splitByBool(a, m):
    if m[0]:
        return np.split(a, np.nonzero(np.diff(m))[0] + 1)[::2]
    else:
        return np.split(a, np.nonzero(np.diff(m))[0] + 1)[1::2] 

This will return a list of arrays, split into chunks of True in m

Upvotes: 2

Related Questions