lopsided
lopsided

Reputation: 2410

How can I extract indices from a numpy array where the size of a contiguous matching section is larger than some minimum?

Suppose I have some array like

a = np.random.random(100) > 0.5

array([ True,  True, False, False,  True, False, False,  True,  True,
        True, False, False,  True, False,  True, False,  True,  True,
       False, False, False, False, False,  True,...

I want to find the start indices for all sections of neighbouring Trues of a minimum of X. So for X=3 in the random snippet above I would want 7. For X=2 I should get 0,7,16.

I can do this with loops but wondering if anyone can tell me a smarter way?

Upvotes: 0

Views: 141

Answers (3)

Dani Mesejo
Dani Mesejo

Reputation: 61930

Try scipy.signal.find_peaks

import numpy as np
from scipy.signal import find_peaks

a = np.array([True, True, False, False, True, False, False, True, True,
              True, False, False, True, False, True, False, True, True,
              False, False, False, False, False, True])

_, peaks = find_peaks(np.r_[0, a, 0], width=3)
result = peaks["left_bases"]
print(result)

Output

[7]

For width=2, you have:

_, peaks = find_peaks(np.r_[0, a, 0], width=2)
result = peaks["left_bases"]
print(result)

Output

[ 0  7 16]

Upvotes: 2

user7864386
user7864386

Reputation:

You can find consecutive Trues by finding the cumulative sum of the boolean array and then splitting that cumsum array into subarrays of consecutive numbers and extracting the starting points of subarrays that are of length X.

def starting_point_of_X_consecutive_Trues(arr, X):
    arr_cumsum = arr.cumsum()
    splits = np.split(arr_cumsum, np.where(np.diff(arr_cumsum) != 1)[0]+1)
    relevant_points = [splits[0][0]] if len(splits[0]) >= X else []
    relevant_points += [split[1] for split in splits[1:] if len(split)-1 >= X]
    return np.isin(arr_cumsum, relevant_points).nonzero()[0]

Output:

starting_point_of_X_consecutive_Trues(a, 3) # [7]
starting_point_of_X_consecutive_Trues(a, 2) # [0,7,16]

Upvotes: 1

Antoine Redier
Antoine Redier

Reputation: 485

you can use a convolution :

convolution = np.convolve(a, np.array([1, 1, 1]))
np.where(convolution == 3)[0] - 2

here the convultion [1, 1, 1] will sum the number with the number before and after it. Then you can find all the indices where 3 is reached and substract 2

here is the generalisation with any number of consecutives

def find_consecutive_sequences(number_of_consecutive, a)
    convolution = np.convolve(a, np.ones(shape=(number_of_consecutive)))
    return np.where(convolution == number_of_consecutive)[0] - (number_of_consecutive - 1 )

print(find_consecutive_sequences(3, a))
print(find_consecutive_sequences(4, a))
print(find_consecutive_sequences(5, a))

which gives

[ 7 16 17 18]
[16 17]
[16]

for a (slightly modified to to test the 4 and 5 case) being

a = np.array([ True,  True, False, False,  True, False, False,  True,  True,
        True, False, False,  True, False,  True, False,  True,  True,
       True, True, False, False])

Upvotes: 1

Related Questions