Reputation: 2092
I am trying to down sample a fixed [Mx1] vector into any given [Nx1] dimensions by using averaging method. I have a dynamic window size that changes every time depending upon the required output array. So, in some cases i get lucky and get window size of int that perfectly fits according to the window size and sometimes i get floating number as a windows size. But, how can i use floating size windows to make a vector of [Nx1] size from a fixed [Mx1] vector?
Below is the code that i have tried:
chunk = 0.35
def fixed_meanVector(vec, chunk):
size = (vec.size*chunk) #size of output according to the chunk
R = (vec.size/size) #windows size to transform array into chunk size
pad_size = math.ceil(float(vec.size)/R)*R - vec.size
vec_padded = np.append(vec, np.zeros(pad_size)*np.NaN)
print "Org Vector: ",vec.size, "output Size: ",size, "Windows Size: ",R, "Padding size", pad_size
newVec = scipy.nanmean(vec_padded.reshape(-1,R), axis=1)
print "New Vector shape: ",newVec.shape
return newVec
print "Word Mean of N values Similarity: ",cosine(fixed_meanVector(vector1, chunk)
,fixed_meanVector(vector2, chunk))
Output:
New Vector shape: (200,)
Org Vector: 400 output Size: 140.0 Windows Size: 2.85714285714 Padding size 0.0
New Vector shape: (200,)
0.46111661289
In above example, I need to down sample [Mx1] ([400x1]) vector in Nx1 ([140x1]) dimensions. So, dynamically window size [2.857x1] can be used to downsample [Mx1] vector . But, in this case i am getting a vector of [200x1] as my output instead of [140x1] due to the floating window it raises to the flour(2.85) it is downsampled with -> [2x1]. Padding is zero because, my window size is perfect for new [Nx1] dimensions. So, is there any way to use such type of windows sizes to down sample a [Mx1] vector?
Upvotes: 2
Views: 2056
Reputation: 18668
It is possible but not natural to vectorise that, as soon as M%N>0
. because the amount of cells used to build the result array is not constant, between 3 and 4 in your case.
The natural method is to run through the array, adjusting at each bin :
the idea is to fill each bin until overflow. then cut the overflow (carry) and keep it for next bin. the last carry is always null using int arithmetic.
The code :
def resized(data,N):
M=data.size
res=empty(N,data.dtype)
carry=0
m=0
for n in range(N):
sum = carry
while m*N - n*M < M :
sum += data[m]
m += 1
carry = (m-(n+1)*M/N)*data[m-1]
sum -= carry
res[n] = sum*N/M
return res
Test :
In [5]: resized(np.ones(7),3)
Out[5]: array([ 1., 1., 1.])
In [6]: %timeit resized(rand(400),140)
1000 loops, best of 3: 1.43 ms per loop
It works, but not very quickly. Fortunatelly, you can speed it with numba
:
from numba import jit
resized2=jit(resized)
In [7]: %timeit resized2(rand(400),140)
1 loops, best of 3: 8.21 µs per loop
Probably faster than any pure numpy
solution (here for N=3*M
):
IN [8]: %timeit rand(402).reshape(-1,3).mean(1)
10000 loops, best of 3: 39.2 µs per loop
Note it works also if M>N
.
In [9]: resized(arange(4.),9)
Out[9]: array([ 0. , 0. , 0.75, 1. , 1.5 , 2. , 2.25, 3. , 3. ])
Upvotes: 3
Reputation: 4875
You're doing it wrong, you build a window for your required decimation, not the other way around.
Mr Nyquist says you can't have a BW above fs/2, or you'll have nasty aliasing.
So to solve it you don't just "average", but lowpass so that frequencies above fs/2 are below your acceptable noise floor.
MA's are a valid type of low pass filter, you're just applying it to the wrong array.
The usual case for arbitrary decimation is.
Upsample -> Lowpass -> Downsample
So, to be able to arbitrary Decimate from N to M samples the algorithm is:
- find LCM between your current samples your target samples.
- upsample by
LCM/N
- design a filter using a stop frequency
ws<= M/LCM
- downsample by
LCM/M
What you call averaging method, is a FIR filter with a rectangular window
If you use the first zero of the frequency response in that window as stop band, then you you can calculate the needed window size K , as
2/K <= M/LCM
so you must use windows of size:
ceil(2*LCM/M) = K
Obviously, you don't need to implement all of this. Just design a proper window with ws<= M/LCM
and apply it using scipy.signal.resample.
And if the ceil
applied to the window messes up your results, don't use rectangular windows, there are tons of better filters you can use.
Upvotes: 2