How we can down sample a 1D array values by averaging method using float and int window sizes?

Question

I am trying to down sample a fixed [Mx1] vector into any given [Nx1] dimensions by using averaging method. I have a dynamic window size that changes every time depending upon the required output array. So, in some cases i get lucky and get window size of int that perfectly fits according to the window size and sometimes i get floating number as a windows size. But, how can i use floating size windows to make a vector of [Nx1] size from a fixed [Mx1] vector?

Below is the code that i have tried:

chunk = 0.35
def fixed_meanVector(vec, chunk):
   size = (vec.size*chunk) #size of output according to the chunk
   R    = (vec.size/size) #windows size to transform array into chunk size
   pad_size = math.ceil(float(vec.size)/R)*R - vec.size
   vec_padded = np.append(vec, np.zeros(pad_size)*np.NaN)

   print "Org Vector: ",vec.size, "output Size: ",size, "Windows Size: ",R, "Padding size", pad_size
   newVec = scipy.nanmean(vec_padded.reshape(-1,R), axis=1)
   print "New Vector shape: ",newVec.shape
   return newVec

print "Word Mean of N values Similarity: ",cosine(fixed_meanVector(vector1, chunk)
                                                      ,fixed_meanVector(vector2, chunk))

Output:

New Vector shape:  (200,)
Org Vector:  400 output Size:  140.0 Windows Size:  2.85714285714 Padding  size 0.0
New Vector shape:  (200,)
0.46111661289

In above example, I need to down sample [Mx1] ([400x1]) vector in Nx1 ([140x1]) dimensions. So, dynamically window size [2.857x1] can be used to downsample [Mx1] vector . But, in this case i am getting a vector of [200x1] as my output instead of [140x1] due to the floating window it raises to the flour(2.85) it is downsampled with -> [2x1]. Padding is zero because, my window size is perfect for new [Nx1] dimensions. So, is there any way to use such type of windows sizes to down sample a [Mx1] vector?

B. M. · Accepted Answer

It is possible but not natural to vectorise that, as soon as M%N>0. because the amount of cells used to build the result array is not constant, between 3 and 4 in your case.

The natural method is to run through the array, adjusting at each bin :

the idea is to fill each bin until overflow. then cut the overflow (carry) and keep it for next bin. the last carry is always null using int arithmetic.

The code :

def resized(data,N):
    M=data.size
    res=empty(N,data.dtype)
    carry=0
    m=0
    for n in range(N):
        sum = carry
        while m*N - n*M < M :
            sum += data[m]
            m += 1
        carry = (m-(n+1)*M/N)*data[m-1]
        sum -= carry
        res[n] = sum*N/M
    return res

Test :

In [5]: resized(np.ones(7),3)
Out[5]: array([ 1.,  1.,  1.])

In [6]: %timeit resized(rand(400),140)
    1000 loops, best of 3: 1.43 ms per loop

It works, but not very quickly. Fortunatelly, you can speed it with numba :

from numba import jit
resized2=jit(resized)             

In [7]: %timeit resized2(rand(400),140)
1 loops, best of 3: 8.21 µs per loop

Probably faster than any pure numpy solution (here for N=3*M):

IN [8]: %timeit rand(402).reshape(-1,3).mean(1)
10000 loops, best of 3: 39.2 µs per loop

Note it works also if M>N.

In [9]: resized(arange(4.),9)
Out[9]: array([ 0.  ,  0.  ,  0.75,  1.  ,  1.5 ,  2.  ,  2.25,  3.  ,  3.  ])

How we can down sample a 1D array values by averaging method using float and int window sizes?

Answers (2)

Related Questions