Reputation: 13
I have a weighted moving average function which smooths a curve by averaging 3*width values to the left and to the right of each point using a gaussian weighting mechanism. I am only worried about smoothing a region bounded by [start, end]. The following code works, but the problem is runtime with large arrays.
import numpy as np
def weighted_moving_average(x, y, start, end, width = 3):
def gaussian(x, a, m, s):
return a*exp(-(x-m)**2/(2*s**2))
cut = (x>=start-3*width)*(x<=end+3*width)
x, y = x[cut], y[cut]
x_avg = x[(x>=start)*(x<=end)]
y_avg = np.zeros(len(x_avg))
bin_vals = np.arange(-3*width,3*width+1)
weights = gaussian(bin_vals, 1, 0, width)
for i in range(len(x_avg)):
y_vals = y[i:i+6*width+1]
y_avg[i] = np.average(y_vals, weights = weights)
return x_avg, y_avg
From my understanding, it is generally inefficient to loop through a NumPy array. I was wondering if anyone had an idea to replace the for loop with something more runtime efficient.
Thanks
Upvotes: 1
Views: 427
Reputation: 828
The main concern with looping over a large array is that the memory allocation for the large array can be expensive, and the whole thing has to be initialized before the loop can start.
In this particular case I'd go with what Divakar is saying.
In general, if you find yourself in a circumstance where you really need to iterate over a large collection, use iterators instead of arrays. For a relatively simple case like this, just replace range
with xrange
(see https://docs.python.org/2/library/functions.html#xrange).
Upvotes: 0
Reputation: 221574
That slicing and summing/averaging on a weighted window basically corresponds to 1D convolution with the kernel being flipped. Now, for 1D
convolution, NumPy has a very efficient implementation in np.convolve
and that could be used to get rid of the loop and give us y_avg
. Thus, we would have a vectorized implementation like so -
y_sums = np.convolve(y,weights[::-1],'valid')
y_avg = np.true_divide(y_sums,weights.sum())
Upvotes: 2