Pandas DataFrame iterate over a window of rows quickly

Question

I've got a time-series dataframe that looks something like:

datetime                    gesture         left-5-x  ...30 columns omitted
2022-09-27 19:54:54.396680  gesture0255     533 
2022-09-27 19:54:54.403298  gesture0255     534 
2022-09-27 19:54:54.408938  gesture0255     535 
2022-09-27 19:54:54.413995  gesture0255     523 
2022-09-27 19:54:54.418666  gesture0255     522 
... 95 000 rows omitted

And I want to create a new column df['cross_correlation'] which is the function of multiple sequential rows. So the cross_correlation of row i depends on the data from rows i-10 to i+10.

I could do this with df.iterrow(), but that seems like the non-idiomatic version. Is there a function like

df.window(-10, +10).apply(lambda rows: calculate_cross_correlation(rows)

or similar?

EDIT:

Thanks @chris, who pointed me towards df.rolling(), although I now have this example which better reflect the problem I'm having:

Here's a simplified version of the function I want to apply over the moving window. Note that the actual version requires that the input be the full 2D window of shape (window_size, num_columns) but the toy function below doesn't actually need the input to be 2D. I've added an assertion to make sure this is true:

def sum_over_2d(x):
    assert len(x.shape) == 2, f'shape of input is {x.shape} and not of length 2'
    return x.sum()

And now if I use .rolling with .apply

df.rolling(window=10, center=True).apply(
    sum_over_2d
)

, I get an assertion error:

AssertionError: shape of input is (10,) and not of length 2

and if I print the input x before the assertion, I get:

0    533.0
1    534.0
2    535.0
3    523.0
4    522.0
5    526.0
6    510.0
7    509.0
8    502.0
9    496.0
dtype: float64

which is one column from my many-columned dataset. What I'm wanting is for the input x to be a dataframe or 2d numpy array.

Pandas DataFrame iterate over a window of rows quickly

Answers (1)

Related Questions