Reputation: 1329
My example database is as follows:
A B C D
0 5 3 2 1
1 3 3 4 5
2 6 2 7 8
3 4 2 4 0
4 6 2 4 3
I would like to create a rolling window and, for each row, compute a sumproduct of two or more columns. For example, with a lookback of 4 and min_periods
parameter of rolling()
set to 3 (i.e. only computing sumproduct values if there are min. 3 rows of information), I would like to compute the sumproduct of columns A
and C
, and for A
and B
-squared (i.e. A
,B
,B
):
A B C D A&C A&B&B
0 5 3 2 1 NaN NaN
1 3 3 4 5 NaN NaN
2 6 2 7 8 64 96
3 4 2 4 0 80 112
4 6 2 4 3 94 91
I know that using rolling().agg()
it's straightforward to compute column-wise sums/products:
df.rolling(4, min_periods=3).agg(np.sum)
df.rolling(4, min_periods=3).agg(np.product)
but I had no luck in figuring out how can I apply the same aggregation logic across different columns.
Note that my original dataframe is quite large (3k rows, 500 columns) thus I would like to avoid the use of loops if possible.
Upvotes: 1
Views: 387
Reputation: 1329
Well, it actually wasn't that hard, I managed to solve it with a very easy syntax. I'll leave the question with my answer here because there are not many questions in this topic and hopefully it will help others too in the future.
We can just take the columns of our interest, multiply them element-by-element horizontally, and then apply a rolling sum over the multiplication results:
df[['A','C']].product(axis=1).rolling(4, min_periods=3).sum()
df[['A','B','B']].product(axis=1).rolling(4, min_periods=3).sum()
These will yield:
0 NaN
1 NaN
2 64.0
3 80.0
4 94.0
and
0 NaN
1 NaN
2 96.0
3 112.0
4 91.0
respectively which is exactly the desired output.
Upvotes: 1