pvstrln
pvstrln

Reputation: 179

How to perform a rolling sum along a matrix axis?

Given matrix X with T rows and columns k:

T = 50
H = 10
k = 5 
X = np.arange(T).reshape(T,1)*np.ones((T,k))

How to perform a rolling cumulative sum of X along the rows axis with lag H?

Xcum = np.zeros((T-H,k))
for t in range(H,T):
    Xcum[t-H,:] = np.sum( X[t-H:t,:], axis=0 )

Notice, preferably avoiding strides and convolution, under broadcasting/vectorization best practices.

Upvotes: 4

Views: 1432

Answers (3)

hpaulj
hpaulj

Reputation: 231415

Here's a strided solution. I realize it's not what you want, but I wondered how it compares.

def foo2(X):
    temp = np.lib.stride_tricks.as_strided(X, shape=(H,T-H+1,k), 
        strides=(k*8,)+X.strides))
    # return temp.sum(0)
    return np.einsum('ijk->jk', temp)

This times at 35 us, compared to 22 us for Jaime's cumsum solution. einsum is a bit faster than sum(0). temp uses X's data, so there's no memory penalty. But it is harder to understand.

Upvotes: 0

Jaime
Jaime

Reputation: 67437

You are actually missing one last row in your rolling sum, this would be the correct output:

Xcum = np.zeros((T-H+1, k))
for t in range(H, T+1):
    Xcum[t-H, :] = np.sum(X[t-H:t, :], axis=0)

If you need to do this over an arbitrary axis with numpy only, the simplest will be to do a np.cumsum along that axis, then compute your results as a difference of two slices of that. With your sample array and axis:

temp = np.cumsum(X, axis=0)
Xcum = np.empty((T-H+1, k))
Xcum[0] = temp[H-1]
Xcum[1:] = temp[H:] - temp[:-H]

Another option is to use pandas and its rolling_sum function, which against all odds apparently works on 2D arrays just as you need it to:

import pandas as pd
Xcum = pd.rolling_sum(X, 10)[9:] # first 9 entries are NaN

Upvotes: 1

Daniel
Daniel

Reputation: 19547

Sounds like you want the following:

import scipy.signal
scipy.signal.convolve2d(X, np.ones((H,1)), mode='valid')

This of course uses convolve, but the question, as stated, is a convolution operation. Broadcasting would result in a much slower/memory intensive algorithm.

Upvotes: 3

Related Questions