Reputation: 1023
I've got a time-series dataframe that looks something like:
datetime gesture left-5-x ...30 columns omitted
2022-09-27 19:54:54.396680 gesture0255 533
2022-09-27 19:54:54.403298 gesture0255 534
2022-09-27 19:54:54.408938 gesture0255 535
2022-09-27 19:54:54.413995 gesture0255 523
2022-09-27 19:54:54.418666 gesture0255 522
... 95 000 rows omitted
And I want to create a new column df['cross_correlation']
which is the function of multiple sequential rows. So the cross_correlation
of row i
depends on the data from rows i-10
to i+10
.
I could do this with df.iterrow()
, but that seems like the non-idiomatic version. Is there a function like
df.window(-10, +10).apply(lambda rows: calculate_cross_correlation(rows)
or similar?
EDIT:
Thanks @chris, who pointed me towards df.rolling()
, although I now have this example which better reflect the problem I'm having:
Here's a simplified version of the function I want to apply over the moving window. Note that the actual version requires that the input be the full 2D window of shape (window_size, num_columns)
but the toy function below doesn't actually need the input to be 2D. I've added an assertion to make sure this is true:
def sum_over_2d(x):
assert len(x.shape) == 2, f'shape of input is {x.shape} and not of length 2'
return x.sum()
And now if I use .rolling
with .apply
df.rolling(window=10, center=True).apply(
sum_over_2d
)
, I get an assertion error:
AssertionError: shape of input is (10,) and not of length 2
and if I print the input x
before the assertion, I get:
0 533.0
1 534.0
2 535.0
3 523.0
4 522.0
5 526.0
6 510.0
7 509.0
8 502.0
9 496.0
dtype: float64
which is one column from my many-columned dataset. What I'm wanting is for the input x
to be a dataframe or 2d numpy array.
Upvotes: 0
Views: 543
Reputation: 29732
IIUC, one way using pandas.Series.rolling.apply
.
Example with sum
:
df["new"] = df["left-5-x"].rolling(3, center=True, min_periods=1).sum()
Output:
datetime gesture left-5-x new explain
0 2022-09-27 19:54:54.396680 gesture0255 533 1067.0 533+534
1 2022-09-27 19:54:54.403298 gesture0255 534 1602.0 533+534+535
2 2022-09-27 19:54:54.408938 gesture0255 535 1592.0 534+535+523
3 2022-09-27 19:54:54.413995 gesture0255 523 1580.0 535+523+522
4 2022-09-27 19:54:54.418666 gesture0255 522 1045.0 523+522
You can see left-5-x
are summed with +1 to -1 neighbors.
Edit:
If you want to use roll-ed dataframe, one way would be iterate over the rolling:
new_df = pd.concat([sum_over_2d(d) for d in df.rolling(window=10)],axis=1).T
Output:
0 1 2 3
0 0 1 2 3
1 4 6 8 10
2 12 15 18 21
3 24 28 32 36
4 40 45 50 55
5 60 66 72 78
6 84 91 98 105
7 112 120 128 136
8 144 153 162 171
9 180 190 200 210
Or as per @Sandwichnick's comment, you can use method=="table"
, but only if pass engine=="numba"
. In other words, your sum_over_2d
must be numba
compilable (which is beyond the scope of this question and my knowledge)
df.rolling(window=10, center=True, method="table").sum(engine="numba")
Upvotes: 2