Nomiluks
Nomiluks

Reputation: 2092

How we can down sample a 1D array values by averaging method using float and int window sizes?

I am trying to down sample a fixed [Mx1] vector into any given [Nx1] dimensions by using averaging method. I have a dynamic window size that changes every time depending upon the required output array. So, in some cases i get lucky and get window size of int that perfectly fits according to the window size and sometimes i get floating number as a windows size. But, how can i use floating size windows to make a vector of [Nx1] size from a fixed [Mx1] vector?

Below is the code that i have tried:

chunk = 0.35
def fixed_meanVector(vec, chunk):
   size = (vec.size*chunk) #size of output according to the chunk
   R    = (vec.size/size) #windows size to transform array into chunk size
   pad_size = math.ceil(float(vec.size)/R)*R - vec.size
   vec_padded = np.append(vec, np.zeros(pad_size)*np.NaN)

   print "Org Vector: ",vec.size, "output Size: ",size, "Windows Size: ",R, "Padding size", pad_size
   newVec = scipy.nanmean(vec_padded.reshape(-1,R), axis=1)
   print "New Vector shape: ",newVec.shape
   return newVec

print "Word Mean of N values Similarity: ",cosine(fixed_meanVector(vector1, chunk)
                                                      ,fixed_meanVector(vector2, chunk))

Output:

New Vector shape:  (200,)
Org Vector:  400 output Size:  140.0 Windows Size:  2.85714285714 Padding  size 0.0
New Vector shape:  (200,)
0.46111661289

In above example, I need to down sample [Mx1] ([400x1]) vector in Nx1 ([140x1]) dimensions. So, dynamically window size [2.857x1] can be used to downsample [Mx1] vector . But, in this case i am getting a vector of [200x1] as my output instead of [140x1] due to the floating window it raises to the flour(2.85) it is downsampled with -> [2x1]. Padding is zero because, my window size is perfect for new [Nx1] dimensions. So, is there any way to use such type of windows sizes to down sample a [Mx1] vector?

Upvotes: 2

Views: 2056

Answers (2)

B. M.
B. M.

Reputation: 18668

It is possible but not natural to vectorise that, as soon as M%N>0. because the amount of cells used to build the result array is not constant, between 3 and 4 in your case.

The natural method is to run through the array, adjusting at each bin :

enter image description here

the idea is to fill each bin until overflow. then cut the overflow (carry) and keep it for next bin. the last carry is always null using int arithmetic.

The code :

def resized(data,N):
    M=data.size
    res=empty(N,data.dtype)
    carry=0
    m=0
    for n in range(N):
        sum = carry
        while m*N - n*M < M :
            sum += data[m]
            m += 1
        carry = (m-(n+1)*M/N)*data[m-1]
        sum -= carry
        res[n] = sum*N/M
    return res

Test :

In [5]: resized(np.ones(7),3)
Out[5]: array([ 1.,  1.,  1.])

In [6]: %timeit resized(rand(400),140)
    1000 loops, best of 3: 1.43 ms per loop

It works, but not very quickly. Fortunatelly, you can speed it with numba :

from numba import jit
resized2=jit(resized)             

In [7]: %timeit resized2(rand(400),140)
1 loops, best of 3: 8.21 µs per loop

Probably faster than any pure numpy solution (here for N=3*M):

IN [8]: %timeit rand(402).reshape(-1,3).mean(1)
10000 loops, best of 3: 39.2 µs per loop

Note it works also if M>N.

In [9]: resized(arange(4.),9)
Out[9]: array([ 0.  ,  0.  ,  0.75,  1.  ,  1.5 ,  2.  ,  2.25,  3.  ,  3.  ])

Upvotes: 3

xvan
xvan

Reputation: 4875

You're doing it wrong, you build a window for your required decimation, not the other way around.

Mr Nyquist says you can't have a BW above fs/2, or you'll have nasty aliasing.

So to solve it you don't just "average", but lowpass so that frequencies above fs/2 are below your acceptable noise floor.

MA's are a valid type of low pass filter, you're just applying it to the wrong array.

The usual case for arbitrary decimation is.

 Upsample -> Lowpass -> Downsample

So, to be able to arbitrary Decimate from N to M samples the algorithm is:

  • find LCM between your current samples your target samples.
  • upsample by LCM/N
  • design a filter using a stop frequency ws<= M/LCM
  • downsample by LCM/M

What you call averaging method, is a FIR filter with a rectangular window

If you use the first zero of the frequency response in that window as stop band, then you you can calculate the needed window size K , as

2/K <= M/LCM

so you must use windows of size:

ceil(2*LCM/M) = K

Obviously, you don't need to implement all of this. Just design a proper window with ws<= M/LCM and apply it using scipy.signal.resample.

And if the ceil applied to the window messes up your results, don't use rectangular windows, there are tons of better filters you can use.

Upvotes: 2

Related Questions