Reputation: 2812
Sorry I am not sure how to put the title more accurately.
I have an array which I would like to evenly split into 3 arrays, then each array will have a different size which is the downsampled version of the original array via averaging.
Here is what I have:
import numpy as np
a = np.arange(100)
bins = [5, 4, 3]
split_index = [[20, 39], [40, 59], [60, 80]]
b = []
for count, item in enumerate(bins):
start = split_index[count][0]
end = split_index[count][1]
increment = (end - start) // item
b_per_band = []
for i in range(item):
each_slice = a[start + i * increment : start + (i + 1) * increment]
b_per_band.append(each_slice.mean())
b.append(b_per_band)
print(b)
Result:
[[21.0, 24.0, 27.0, 30.0, 33.0], [41.5, 45.5, 49.5, 53.5], [62.5, 68.5, 74.5]]
So I loop through bins, find out how much increment is for each step. Slice it accordingly and append the mean to the result.
But this is really ugly and most importantly has bad performance. As I am dealing with audio spectrum in my case, I would really like to learn a more efficient way to achieving the same result.
Any suggestion?
Upvotes: 4
Views: 296
Reputation: 25544
Here's an option using np.add.reduceat
:
a = np.arange(100)
n_in_bin = [5, 4, 3]
split_index = [[20, 39], [40, 59], [60, 80]]
b = []
for i, sl in enumerate(split_index):
n_bins = (sl[1]-sl[0])//n_in_bin[i]
v = a[sl[0]:sl[0]+n_in_bin[i]*(n_bins)]
sel_bins = np.linspace(0, len(v), n_in_bin[i]+1, True).astype(np.int)
b.append(np.add.reduceat(v, sel_bins[:-1])/np.diff(sel_bins)))
print(b)
# [array([21., 24., 27., 30., 33.]) array([41.5, 45.5, 49.5, 53.5]) array([62.5, 68.5, 74.5])]
Some notes:
bins
to n_in_bin
to clarify a bit.np.add.reduceat
. From my experience, this can be more efficient than looping. NaN
s in your input data, check out this Q&A.EDIT/REVISION
Since I'm also working on binning stuff at the moment, I tried a couple of things and ran timeit
for the three methods shown so far, 'looped' for the one in the question, 'npredat' using np.add.reduceat
, npsplit using np.split
and got for 100000 iterations an avg time per iteration in [µs]:
a = np.arange(10000)
bins = [5, 4, 3]
split_index = [[20, 3900], [40, 5900], [60, 8000]]
-->
looped: 127.3, npredat: 116.9, npsplit: 135.5
vs.
a = np.arange(100)
bins = [5, 4, 3]
split_index = [[20, 39], [40, 59], [60, 80]]
-->
looped: 95.2, npredat: 103.5, npsplit: 100.5
However, results were slightly inconsistent for multiple runs of the 100k iterations and might differ for other machines than the one I tried this on. So my conclusion would be so far, that differences are marginal. All 3 options fall within the 1µs < domain > 1ms.
Upvotes: 2
Reputation: 887
What you're doing looks very weird to me, including the setup, which could probably use a different approach, making the problem much simpler.
However, using the same approach, you could try this:
b = []
for count, item in enumerate(bins):
start = split_index[count][0]
end = split_index[count][1]
increment = (end - start) // item
b_per_band = np.mean(np.split(a[start:start + item * increment], item),axis=1)
b.append(b_per_band)
Upvotes: 0