cfadr2021
cfadr2021

Reputation: 117

Why isn't the groupby method working in this rolling sum calculation in Pandas and how can I fix it?

I have the following data, which I multi-index with Date and Ticker as indices and I am then adding a rolling sum of the Vol column for each stock.

Raw Data:

Date,Ticker,SharePrice,Vol
2014-12-31,MSFT,10.79,16.24
2015-03-31,MSFT,19.44,14.94
2015-06-30,MSFT,3.73,19.79
2015-09-30,MSFT,3.76,6.52
2015-12-31,MSFT,10.56,17.91
2016-03-31,MSFT,13.56,11.96
2016-06-30,MSFT,16.27,19.79
2015-03-31,GM,18.22,9.92
2015-06-30,GM,17.16,18.69
2015-09-30,GM,15.92,17.45

Here is the code I use to calculate my rolling sum of vol - note that I do not want the rolling sum to include vol related to a different Ticker (I attempt to use gruopby to stop this but it doesn't work):

Code:

import pandas as pd

stocks = pd.read_csv("C:\\Users\\stocks.csv", index_col=["Date", "Ticker"])

stocks['RollingVol'] = stocks['Vol'].groupby(level=1).fillna(0).rolling(1095, min_periods=2).sum()

print(stocks)

Here is the result I get:

Date,Ticker,SharePrice,Vol,RollingVol
2014-12-31,MSFT,10.79,16.24,
2015-03-31,MSFT,19.44,14.94,31.18
2015-06-30,MSFT,3.73,19.79,50.97
2015-09-30,MSFT,3.76,6.52,57.489999999999995
2015-12-31,MSFT,10.56,17.91,75.39999999999999
2016-03-31,MSFT,13.56,11.96,87.35999999999999
2016-06-30,MSFT,16.27,19.79,107.14999999999998
2015-03-31,GM,18.22,9.92,117.06999999999998
2015-06-30,GM,17.16,18.69,135.76
2015-09-30,GM,15.92,17.45,153.20999999999998

My problem for example here is that the first rolling sum entry for GM (117.0699999) is including the MSFT values whereas it should just be NaN (since min_periods = 2) and then the second entry for GM should be 9.92+18.69= 28.61 and so on as detailed below. I don't understand why the groupby(level=1) in my code is not achieving this and how I can fix it?

Many thanks in advance

Expected Result:

Date,Ticker,SharePrice,Vol,RollingVol
2014-12-31,MSFT,10.79,16.24,
2015-03-31,MSFT,19.44,14.94,31.18
2015-06-30,MSFT,3.73,19.79,50.97
2015-09-30,MSFT,3.76,6.52,57.49
2015-12-31,MSFT,10.56,17.91,75.4
2016-03-31,MSFT,13.56,11.96,87.36
2016-06-30,MSFT,16.27,19.79,107.15
2015-03-31,GM,18.22,9.92,
2015-06-30,GM,17.16,18.69,28.61
2015-09-30,GM,15.92,17.45,46.06

Upvotes: 0

Views: 166

Answers (1)

Valdi_Bo
Valdi_Bo

Reputation: 30971

The problem with your code is that when you call groupby then for each group is invoked actually only the first function from the following content, in your example only fillna, which changes nothing.

Invocation of the following methods is performed on the final ("consolidated") result of the preceding groupby.

To compute what you really want, change your code to:

stocks['RollingVol'] = stocks.Vol.groupby(level=1).apply(
lambda grp: grp.rolling(1095, min_periods=2).sum())

The result, for your sample data, is:

                   SharePrice    Vol  RollingVol
Date       Ticker                               
2014-12-31 MSFT         10.79  16.24         NaN
2015-03-31 MSFT         19.44  14.94       31.18
2015-06-30 MSFT          3.73  19.79       50.97
2015-09-30 MSFT          3.76   6.52       57.49
2015-12-31 MSFT         10.56  17.91       75.40
2016-03-31 MSFT         13.56  11.96       87.36
2016-06-30 MSFT         16.27  19.79      107.15
2015-03-31 GM           18.22   9.92         NaN
2015-06-30 GM           17.16  18.69       28.61
2015-09-30 GM           15.92  17.45       46.06

Note that the first value in each group is NaN, since you want min_periods=2.

And a final detail to consider: You chose a strikingly big window size (1095). This raises suspicion that you actually want an expanding window, from the start of the current group, up to the current row. Something like:

stocks['RollingVol'] = stocks.Vol.groupby(level=1).apply(
    lambda grp: grp.expanding(min_periods=2).sum())

Or maybe you want the rolling sum for 3 years, assuming that you have data for each day.

Upvotes: 1

Related Questions