Reputation: 473
I have successfully added a new summed Volume column using Transform when grouping by Date like so:
df
Name Date Volume
--------------------------
APL 12-01-2017 1102
BSC 12-01-2017 4500
CDF 12-02-2017 5455
df['vol_all_daily'] = df['Volume'].groupby([df['Date']]).transform('sum')
Name Date Volume vol_all_daily
------------------------------------------
APL 12-01-2017 1102 5602
BSC 12-01-2017 4500 5602
CDF 12-02-2017 5455 5455
However when I want to take the rolling average it doesn't work!
df['vol_all_ma_2']=df['vol_all_daily'].
groupby([df['Date']]).rolling(window=2).mean()
Returns a DataGroupBy that gives error *and becomes too hard to put back into a df column anyways.
df['vol_all_ma_2'] =
df['vol_all_daily'].groupby([df['Date']]).transform('mean').
rolling(window=2).mean()
This just produces near identical result of vol_all_daily column
Update:
I wasn't taking the just one column per date..The above code will still take multiple dates...Instead I add the .first() to the groupby..Not sure why groupby isnt taking one row per date.
Upvotes: 1
Views: 1132
Reputation: 1559
The behavior of what you have written seems correct (Part 1 below), but perhaps you want to be calling something different (Part 2 below).
Part 1: Why what you have written is behaving correctly:
d = {'Name':['APL', 'BSC', 'CDF'],'Date':pd.DatetimeIndex(['2017-12-01', '2017-12-01', '2017-12-02']),'Volume':[1102,4500,5455]}
df = pd.DataFrame(d)
df['vol_all_daily'] = df['Volume'].groupby([df['Date']]).transform('sum')
print(df)
rolling_vol = df['vol_all_daily'].groupby([df['Date']]).rolling(window=2).mean()
print('')
print(rolling_vol)
I get as output:
Date Name Volume vol_all_daily
0 2017-12-01 APL 1102 5602
1 2017-12-01 BSC 4500 5602
2 2017-12-02 CDF 5455 5455
Date
2017-12-01 0 NaN
1 5602.0
2017-12-02 2 NaN
Name: vol_all_daily, dtype: float64
To understand why this result rolling_vol
is correct, notice that you have first called the groupby
, and only after that you have called rolling
. That should not produce something that fits with df
.
Part 2: What I think you wanted to call (just a rolling average):
If you instead run:
# same as above but without groupby
rolling_vol2 = df['vol_all_daily'].rolling(window=2).mean()
print('')
print(rolling_vol2)
You should get:
0 NaN
1 5602.0
2 5528.5
Name: vol_all_daily, dtype: float64
which looks more like the rolling average you seem to want. To explain that, I suggest reading the details of pandas resampling vs rolling.
Upvotes: 1