Danny W
Danny W

Reputation: 473

Pandas add column based on grouped by rolling average

I have successfully added a new summed Volume column using Transform when grouping by Date like so:

 df

 Name   Date        Volume
 --------------------------
 APL    12-01-2017   1102
 BSC    12-01-2017   4500
 CDF    12-02-2017   5455

 df['vol_all_daily'] = df['Volume'].groupby([df['Date']]).transform('sum')

 Name   Date        Volume   vol_all_daily
 ------------------------------------------
 APL    12-01-2017   1102     5602
 BSC    12-01-2017   4500     5602
 CDF    12-02-2017   5455     5455

However when I want to take the rolling average it doesn't work!

df['vol_all_ma_2']=df['vol_all_daily'].
groupby([df['Date']]).rolling(window=2).mean()

Returns a DataGroupBy that gives error *and becomes too hard to put back into a df column anyways.

 df['vol_all_ma_2'] = 
 df['vol_all_daily'].groupby([df['Date']]).transform('mean').
 rolling(window=2).mean()

This just produces near identical result of vol_all_daily column

Update:

I wasn't taking the just one column per date..The above code will still take multiple dates...Instead I add the .first() to the groupby..Not sure why groupby isnt taking one row per date.

Upvotes: 1

Views: 1132

Answers (1)

muskrat
muskrat

Reputation: 1559

The behavior of what you have written seems correct (Part 1 below), but perhaps you want to be calling something different (Part 2 below).

Part 1: Why what you have written is behaving correctly:

d = {'Name':['APL', 'BSC', 'CDF'],'Date':pd.DatetimeIndex(['2017-12-01', '2017-12-01', '2017-12-02']),'Volume':[1102,4500,5455]}
df = pd.DataFrame(d)
df['vol_all_daily'] = df['Volume'].groupby([df['Date']]).transform('sum')
print(df)
rolling_vol = df['vol_all_daily'].groupby([df['Date']]).rolling(window=2).mean()
print('')
print(rolling_vol)

I get as output:

          Date Name  Volume  vol_all_daily
0 2017-12-01  APL    1102           5602
1 2017-12-01  BSC    4500           5602
2 2017-12-02  CDF    5455           5455

Date         
2017-12-01  0       NaN
            1    5602.0
2017-12-02  2       NaN
Name: vol_all_daily, dtype: float64

To understand why this result rolling_vol is correct, notice that you have first called the groupby, and only after that you have called rolling. That should not produce something that fits with df.

Part 2: What I think you wanted to call (just a rolling average):

If you instead run:

# same as above but without groupby
rolling_vol2 = df['vol_all_daily'].rolling(window=2).mean()
print('')
print(rolling_vol2)

You should get:

0       NaN
1    5602.0
2    5528.5
Name: vol_all_daily, dtype: float64

which looks more like the rolling average you seem to want. To explain that, I suggest reading the details of pandas resampling vs rolling.

Upvotes: 1

Related Questions