Reputation: 4353
I have a sound signal, imported as a numpy array and I want to cut it into chunks of numpy arrays. However, I want the chunks to contain only elements above a threshold. For example:
threshold = 3
signal = [1,2,6,7,8,1,1,2,5,6,7]
should output two arrays
vec1 = [6,7,8]
vec2 = [5,6,7]
Ok, the above are lists, but you get my point.
Here is what I tried so far, but this just kills my RAM
def slice_raw_audio(audio_signal, threshold=5000):
signal_slice, chunks = [], []
for idx in range(0, audio_signal.shape[0], 1000):
while audio_signal[idx] > threshold:
signal_slice.append(audio_signal[idx])
chunks.append(signal_slice)
return chunks
Upvotes: 3
Views: 2323
Reputation: 221514
Here's one approach -
def split_above_threshold(signal, threshold):
mask = np.concatenate(([False], signal > threshold, [False] ))
idx = np.flatnonzero(mask[1:] != mask[:-1])
return [signal[idx[i]:idx[i+1]] for i in range(0,len(idx),2)]
Sample run -
In [48]: threshold = 3
...: signal = np.array([1,1,7,1,2,6,7,8,1,1,2,5,6,7,2,8,7,2])
...:
In [49]: split_above_threshold(signal, threshold)
Out[49]: [array([7]), array([6, 7, 8]), array([5, 6, 7]), array([8, 7])]
Other approaches -
# @Psidom's soln
def arange_diff(signal, threshold):
above_th = signal > threshold
index, values = np.arange(signal.size)[above_th], signal[above_th]
return np.split(values, np.where(np.diff(index) > 1)[0]+1)
# @Kasramvd's soln
def split_diff_step(signal, threshold):
return np.split(signal, np.where(np.diff(signal > threshold))[0] + 1)[1::2]
Timings -
In [67]: signal = np.random.randint(0,9,(100000))
In [68]: threshold = 3
# @Kasramvd's soln
In [69]: %timeit split_diff_step(signal, threshold)
10 loops, best of 3: 39.8 ms per loop
# @Psidom's soln
In [70]: %timeit arange_diff(signal, threshold)
10 loops, best of 3: 20.5 ms per loop
In [71]: %timeit split_above_threshold(signal, threshold)
100 loops, best of 3: 8.22 ms per loop
Upvotes: 2
Reputation: 107287
Here is a Numpythonic approach:
In [115]: np.split(signal, np.where(np.diff(signal > threshold))[0] + 1)
Out[115]: [array([1, 2]), array([6, 7, 8]), array([1, 1, 2]), array([5, 6, 7])]
Note that this will give you all the lower and upper items which based on the logic of splitting (which is based on diff
and continues items) they are always interleaves, which means that you can simply separate them by indexing:
In [121]: signal = np.array([1,2,6,7,8,1,1,2,5,6,7])
In [122]: np.split(signal, np.where(np.diff(signal > threshold))[0] + 1)[::2]
Out[122]: [array([1, 2]), array([1, 1, 2])]
In [123]: np.split(signal, np.where(np.diff(signal > threshold))[0] + 1)[1::2]
Out[123]: [array([6, 7, 8]), array([5, 6, 7])]
You can use the comparison of the first item of your list with the threshold
in order to find out which one of the above slices would give you the upper items.
Generally you can use the following snippet to get the upper items:
np.split(signal, np.where(np.diff(signal > threshold))[0] + 1)[signal[0] < threshold::2]
Upvotes: 2
Reputation: 214927
Here is one option:
above_th = signal > threshold
index, values = np.arange(signal.size)[above_th], signal[above_th]
np.split(values, np.where(np.diff(index) > 1)[0]+1)
# [array([6, 7, 8]), array([5, 6, 7])]
Wrap in a function:
def above_thresholds(signal, threshold):
above_th = signal > threshold
index, values = np.arange(signal.size)[above_th], signal[above_th]
return np.split(values, np.where(np.diff(index) > 1)[0]+1)
above_thresholds(signal, threshold)
# [array([6, 7, 8]), array([5, 6, 7])]
Upvotes: 1