Averages of slices on a 1d nparray: how to make it more NumPy-thonic?

Question

As part of some simulations I'm running, I need to eventually perform the following operation on some very long sequences of (real) numbers. Here's the gist:

Given a long 1-d NumPy array, for each position in the array I want to average up the values before and after that position, take the difference between the averages, and load those differences into another nparray of the same dimension as the original array.

Here's my attempt. It works perfectly, except that it gets super slow as the sequence gets longer.

import numpy as np                                                                                     

def test_sequence(npseq):                                                                                           
    n = npseq.shape[0]                                                                                   

    def f(i):                                                                                          
        pre = np.sum(npseq[:i])/i                                                                        
        post = np.sum(npseq[(i+1):])/(n-i)                                                               
        return pre-post                                                                                

    out = np.array([f(i) for i in range(1,n)])                                                         

    return out

Seems straightforward enough. But...

In [26]: a = np.random.randint(0,100,100000)
In [27]: %timeit example.test_sequence(a)
1 loops, best of 3: 7.69 s per loop

In [17]: a = np.random.randint(0,100,400000)
In [18]: %timeit example.test_sequence(a)
1 loops, best of 3: 1min 50s per loop

I know that there's probably a smart way to vectorize this, but I'm inexperienced with NumPy. Can anyone point me in the right direction?

EDIT: I originally wrote "sum" instead of "average." I meant "average." My bad. And I know that there might be an off-by-one error there---I'm not concerned about it for now. The actual problem is slightly more complicated than the version I've presented here so I'd need to fiddle with it anyway.

Kasravnd · Accepted Answer

Here is one way using np.cumcum():

np.cumsum(a[::-1])[::-1] - np.cumsum(a)

np.cumsum() will generate the sum of previous items and a[::-1])[::-1] is the sum of next items. So if you want to calculate the average the length of next items will be as np.arange(a.size, 1, -1) and the length of previous items will be np.arange(1, a.size) so you can just do:

np.cumsum(a[::-1])[::-1]/np.arange(a.size + 1, 1, -1) - np.cumsum(a)/np.arange(1, a.size + 1)

Demo:

In [53]: a
Out[53]: array([32, 69, 79, 34,  1, 77, 54, 42, 73, 75])

In [54]: np.cumsum(a[::-1])[::-1]/np.arange(a.size + 1 , 1, -1) - np.cumsum(a)/np.arange(1, a.size + 1)
Out[54]: 
array([ 16.72727273,  -0.1       , -11.66666667,  -9.        ,
         3.        ,   4.83333333,  -0.62857143,  -1.        ,
        -1.88888889, -16.1       ])

Averages of slices on a 1d nparray: how to make it more NumPy-thonic?

Answers (2)

Related Questions