Qubix
Qubix

Reputation: 4353

Extract subarrays of numpy array whose values are above a threshold

I have a sound signal, imported as a numpy array and I want to cut it into chunks of numpy arrays. However, I want the chunks to contain only elements above a threshold. For example:

threshold = 3
signal = [1,2,6,7,8,1,1,2,5,6,7]

should output two arrays

vec1 = [6,7,8]
vec2 = [5,6,7]

Ok, the above are lists, but you get my point.

Here is what I tried so far, but this just kills my RAM

def slice_raw_audio(audio_signal, threshold=5000):

    signal_slice, chunks = [], []

    for idx in range(0, audio_signal.shape[0], 1000):
        while audio_signal[idx] > threshold:
            signal_slice.append(audio_signal[idx])
         chunks.append(signal_slice)
    return chunks

Upvotes: 3

Views: 2323

Answers (3)

Divakar
Divakar

Reputation: 221514

Here's one approach -

def split_above_threshold(signal, threshold):
    mask = np.concatenate(([False], signal > threshold, [False] ))
    idx = np.flatnonzero(mask[1:] != mask[:-1])
    return [signal[idx[i]:idx[i+1]] for i in range(0,len(idx),2)]

Sample run -

In [48]: threshold = 3
    ...: signal = np.array([1,1,7,1,2,6,7,8,1,1,2,5,6,7,2,8,7,2])
    ...: 

In [49]: split_above_threshold(signal, threshold)
Out[49]: [array([7]), array([6, 7, 8]), array([5, 6, 7]), array([8, 7])]

Runtime test

Other approaches -

# @Psidom's soln
def arange_diff(signal, threshold):
    above_th = signal > threshold
    index, values = np.arange(signal.size)[above_th], signal[above_th]
    return np.split(values, np.where(np.diff(index) > 1)[0]+1)

# @Kasramvd's soln   
def split_diff_step(signal, threshold):   
    return np.split(signal, np.where(np.diff(signal > threshold))[0] + 1)[1::2]

Timings -

In [67]: signal = np.random.randint(0,9,(100000))

In [68]: threshold = 3

# @Kasramvd's soln 
In [69]: %timeit split_diff_step(signal, threshold)
10 loops, best of 3: 39.8 ms per loop

# @Psidom's soln
In [70]: %timeit arange_diff(signal, threshold)
10 loops, best of 3: 20.5 ms per loop

In [71]: %timeit split_above_threshold(signal, threshold)
100 loops, best of 3: 8.22 ms per loop

Upvotes: 2

Kasravnd
Kasravnd

Reputation: 107287

Here is a Numpythonic approach:

In [115]: np.split(signal, np.where(np.diff(signal > threshold))[0] + 1)
Out[115]: [array([1, 2]), array([6, 7, 8]), array([1, 1, 2]), array([5, 6, 7])]

Note that this will give you all the lower and upper items which based on the logic of splitting (which is based on diff and continues items) they are always interleaves, which means that you can simply separate them by indexing:

In [121]: signal = np.array([1,2,6,7,8,1,1,2,5,6,7])

In [122]: np.split(signal, np.where(np.diff(signal > threshold))[0] + 1)[::2]
Out[122]: [array([1, 2]), array([1, 1, 2])]

In [123]: np.split(signal, np.where(np.diff(signal > threshold))[0] + 1)[1::2]
Out[123]: [array([6, 7, 8]), array([5, 6, 7])]

You can use the comparison of the first item of your list with the threshold in order to find out which one of the above slices would give you the upper items.

Generally you can use the following snippet to get the upper items:

np.split(signal, np.where(np.diff(signal > threshold))[0] + 1)[signal[0] < threshold::2]

Upvotes: 2

akuiper
akuiper

Reputation: 214927

Here is one option:

above_th = signal > threshold
index, values = np.arange(signal.size)[above_th], signal[above_th]
np.split(values, np.where(np.diff(index) > 1)[0]+1)
# [array([6, 7, 8]), array([5, 6, 7])]

Wrap in a function:

def above_thresholds(signal, threshold):
    above_th = signal > threshold
    index, values = np.arange(signal.size)[above_th], signal[above_th]
    return np.split(values, np.where(np.diff(index) > 1)[0]+1)

above_thresholds(signal, threshold)
# [array([6, 7, 8]), array([5, 6, 7])]

Upvotes: 1

Related Questions