lazarea
lazarea

Reputation: 1329

How to calculate sumproduct on a rolling basis in pandas?

My example database is as follows:

    A   B   C   D
0   5   3   2   1
1   3   3   4   5
2   6   2   7   8
3   4   2   4   0
4   6   2   4   3

I would like to create a rolling window and, for each row, compute a sumproduct of two or more columns. For example, with a lookback of 4 and min_periods parameter of rolling() set to 3 (i.e. only computing sumproduct values if there are min. 3 rows of information), I would like to compute the sumproduct of columns A and C, and for A and B-squared (i.e. A,B,B):

    A   B   C   D   A&C     A&B&B
0   5   3   2   1   NaN     NaN
1   3   3   4   5   NaN     NaN
2   6   2   7   8   64      96
3   4   2   4   0   80      112
4   6   2   4   3   94      91

I know that using rolling().agg() it's straightforward to compute column-wise sums/products:

df.rolling(4, min_periods=3).agg(np.sum)
df.rolling(4, min_periods=3).agg(np.product)

but I had no luck in figuring out how can I apply the same aggregation logic across different columns.

Note that my original dataframe is quite large (3k rows, 500 columns) thus I would like to avoid the use of loops if possible.

Upvotes: 1

Views: 387

Answers (1)

lazarea
lazarea

Reputation: 1329

Well, it actually wasn't that hard, I managed to solve it with a very easy syntax. I'll leave the question with my answer here because there are not many questions in this topic and hopefully it will help others too in the future.

We can just take the columns of our interest, multiply them element-by-element horizontally, and then apply a rolling sum over the multiplication results:

df[['A','C']].product(axis=1).rolling(4, min_periods=3).sum()

df[['A','B','B']].product(axis=1).rolling(4, min_periods=3).sum()

These will yield:

0      NaN
1      NaN
2     64.0
3     80.0
4     94.0

and

0       NaN
1       NaN
2      96.0
3     112.0
4      91.0

respectively which is exactly the desired output.

Upvotes: 1

Related Questions