A more efficient way to resizing Numpy Array into different size chunks

Question

Sorry I am not sure how to put the title more accurately.

I have an array which I would like to evenly split into 3 arrays, then each array will have a different size which is the downsampled version of the original array via averaging.

Here is what I have:

import numpy as np
a = np.arange(100)
bins = [5, 4, 3]
split_index = [[20, 39], [40, 59], [60, 80]]
b = []
for count, item in enumerate(bins):
    start = split_index[count][0]
    end = split_index[count][1]
    increment = (end - start) // item
    b_per_band = []
    for i in range(item):
        each_slice = a[start + i * increment : start + (i + 1) * increment]
        b_per_band.append(each_slice.mean())
    b.append(b_per_band)
print(b)

Result:

[[21.0, 24.0, 27.0, 30.0, 33.0], [41.5, 45.5, 49.5, 53.5], [62.5, 68.5, 74.5]]

So I loop through bins, find out how much increment is for each step. Slice it accordingly and append the mean to the result.

But this is really ugly and most importantly has bad performance. As I am dealing with audio spectrum in my case, I would really like to learn a more efficient way to achieving the same result.

Any suggestion?

FObersteiner · Accepted Answer

Here's an option using np.add.reduceat:

a = np.arange(100)
n_in_bin = [5, 4, 3]
split_index = [[20, 39], [40, 59], [60, 80]]
b = []
for i, sl in enumerate(split_index):
    n_bins = (sl[1]-sl[0])//n_in_bin[i]
    v = a[sl[0]:sl[0]+n_in_bin[i]*(n_bins)]
    sel_bins = np.linspace(0, len(v), n_in_bin[i]+1, True).astype(np.int)
    b.append(np.add.reduceat(v, sel_bins[:-1])/np.diff(sel_bins)))
print(b)
# [array([21., 24., 27., 30., 33.]) array([41.5, 45.5, 49.5, 53.5]) array([62.5, 68.5, 74.5])]

Some notes:

EDIT/REVISION

Since I'm also working on binning stuff at the moment, I tried a couple of things and ran timeit for the three methods shown so far, 'looped' for the one in the question, 'npredat' using np.add.reduceat, npsplit using np.split and got for 100000 iterations an avg time per iteration in [µs]:

a = np.arange(10000)
bins = [5, 4, 3]
split_index = [[20, 3900], [40, 5900], [60, 8000]]
-->
looped: 127.3, npredat: 116.9, npsplit: 135.5

vs.

a = np.arange(100)
bins = [5, 4, 3]
split_index = [[20, 39], [40, 59], [60, 80]]
-->
looped: 95.2, npredat: 103.5, npsplit: 100.5

However, results were slightly inconsistent for multiple runs of the 100k iterations and might differ for other machines than the one I tried this on. So my conclusion would be so far, that differences are marginal. All 3 options fall within the 1µs < domain > 1ms.

A more efficient way to resizing Numpy Array into different size chunks

Answers (2)

Related Questions