Reputation: 117
I have the following data, which I multi-index with Date and Ticker as indices and I am then adding a rolling sum of the Vol column for each stock.
Raw Data:
Date,Ticker,SharePrice,Vol
2014-12-31,MSFT,10.79,16.24
2015-03-31,MSFT,19.44,14.94
2015-06-30,MSFT,3.73,19.79
2015-09-30,MSFT,3.76,6.52
2015-12-31,MSFT,10.56,17.91
2016-03-31,MSFT,13.56,11.96
2016-06-30,MSFT,16.27,19.79
2015-03-31,GM,18.22,9.92
2015-06-30,GM,17.16,18.69
2015-09-30,GM,15.92,17.45
Here is the code I use to calculate my rolling sum of vol - note that I do not want the rolling sum to include vol related to a different Ticker (I attempt to use gruopby to stop this but it doesn't work):
Code:
import pandas as pd
stocks = pd.read_csv("C:\\Users\\stocks.csv", index_col=["Date", "Ticker"])
stocks['RollingVol'] = stocks['Vol'].groupby(level=1).fillna(0).rolling(1095, min_periods=2).sum()
print(stocks)
Here is the result I get:
Date,Ticker,SharePrice,Vol,RollingVol
2014-12-31,MSFT,10.79,16.24,
2015-03-31,MSFT,19.44,14.94,31.18
2015-06-30,MSFT,3.73,19.79,50.97
2015-09-30,MSFT,3.76,6.52,57.489999999999995
2015-12-31,MSFT,10.56,17.91,75.39999999999999
2016-03-31,MSFT,13.56,11.96,87.35999999999999
2016-06-30,MSFT,16.27,19.79,107.14999999999998
2015-03-31,GM,18.22,9.92,117.06999999999998
2015-06-30,GM,17.16,18.69,135.76
2015-09-30,GM,15.92,17.45,153.20999999999998
My problem for example here is that the first rolling sum entry for GM (117.0699999) is including the MSFT values whereas it should just be NaN (since min_periods = 2) and then the second entry for GM should be 9.92+18.69= 28.61 and so on as detailed below. I don't understand why the groupby(level=1) in my code is not achieving this and how I can fix it?
Many thanks in advance
Expected Result:
Date,Ticker,SharePrice,Vol,RollingVol
2014-12-31,MSFT,10.79,16.24,
2015-03-31,MSFT,19.44,14.94,31.18
2015-06-30,MSFT,3.73,19.79,50.97
2015-09-30,MSFT,3.76,6.52,57.49
2015-12-31,MSFT,10.56,17.91,75.4
2016-03-31,MSFT,13.56,11.96,87.36
2016-06-30,MSFT,16.27,19.79,107.15
2015-03-31,GM,18.22,9.92,
2015-06-30,GM,17.16,18.69,28.61
2015-09-30,GM,15.92,17.45,46.06
Upvotes: 0
Views: 166
Reputation: 30971
The problem with your code is that when you call groupby then for each group is invoked actually only the first function from the following content, in your example only fillna, which changes nothing.
Invocation of the following methods is performed on the final ("consolidated") result of the preceding groupby.
To compute what you really want, change your code to:
stocks['RollingVol'] = stocks.Vol.groupby(level=1).apply(
lambda grp: grp.rolling(1095, min_periods=2).sum())
The result, for your sample data, is:
SharePrice Vol RollingVol
Date Ticker
2014-12-31 MSFT 10.79 16.24 NaN
2015-03-31 MSFT 19.44 14.94 31.18
2015-06-30 MSFT 3.73 19.79 50.97
2015-09-30 MSFT 3.76 6.52 57.49
2015-12-31 MSFT 10.56 17.91 75.40
2016-03-31 MSFT 13.56 11.96 87.36
2016-06-30 MSFT 16.27 19.79 107.15
2015-03-31 GM 18.22 9.92 NaN
2015-06-30 GM 17.16 18.69 28.61
2015-09-30 GM 15.92 17.45 46.06
Note that the first value in each group is NaN, since you want min_periods=2.
And a final detail to consider: You chose a strikingly big window size (1095). This raises suspicion that you actually want an expanding window, from the start of the current group, up to the current row. Something like:
stocks['RollingVol'] = stocks.Vol.groupby(level=1).apply(
lambda grp: grp.expanding(min_periods=2).sum())
Or maybe you want the rolling sum for 3 years, assuming that you have data for each day.
Upvotes: 1