Reputation: 2635
Here's a piece of code, I don't get why on the last column rm-5, I get NaN for the first 4 items.
I understand that for the rm columns the 1st 4 items aren't filled because there is no data available, but if I shift the column calculation should be made, shouldn't it ?
Similarly I don't get why there are 5 and not 4 items in the rm-5 column that are NaN
import pandas as pd
import numpy as np
index = pd.date_range('2000-1-1', periods=100, freq='D')
df = pd.DataFrame(data=np.random.randn(100), index=index, columns=['A'])
df['rm']=pd.rolling_mean(df['A'],5)
df['rm-5']=pd.rolling_mean(df['A'].shift(-5),5)
print df.head(n=8)
print df.tail(n=8)
A rm rm-5
2000-01-01 0.109161 NaN NaN
2000-01-02 -0.360286 NaN NaN
2000-01-03 -0.092439 NaN NaN
2000-01-04 0.169439 NaN NaN
2000-01-05 0.185829 0.002341 0.091736
2000-01-06 0.432599 0.067028 0.295949
2000-01-07 -0.374317 0.064222 0.055903
2000-01-08 1.258054 0.334321 -0.132972
A rm rm-5
2000-04-02 0.499860 -0.422931 -0.140111
2000-04-03 -0.868718 -0.458962 -0.182373
2000-04-04 0.081059 -0.443494 -0.040646
2000-04-05 0.500275 -0.093048 NaN
2000-04-06 -0.253915 -0.008288 NaN
2000-04-07 -0.159256 -0.140111 NaN
2000-04-08 -1.080027 -0.182373 NaN
2000-04-09 0.789690 -0.040646 NaN
Upvotes: 4
Views: 14952
Reputation: 1702
You can change the order of operations. Now you are first shifting and afterwards taking the mean. Due to your first shift you create your NaN's at the end.
index = pd.date_range('2000-1-1', periods=100, freq='D')
df = pd.DataFrame(data=np.random.randn(100), index=index, columns=['A'])
df['rm']=pd.rolling_mean(df['A'],5)
df['shift'] = df['A'].shift(-5)
df['rm-5-shift_first']=pd.rolling_mean(df['A'].shift(-5),5)
df['rm-5-mean_first']=pd.rolling_mean(df['A'],5).shift(-5)
print( df.head(n=8))
print( df.tail(n=8))
A rm shift rm-5-shift_first rm-5-mean_first
2000-01-01 -0.120808 NaN 0.830231 NaN 0.184197
2000-01-02 0.029547 NaN 0.047451 NaN 0.187778
2000-01-03 0.002652 NaN 1.040963 NaN 0.395440
2000-01-04 -1.078656 NaN -1.118723 NaN 0.387426
2000-01-05 1.137210 -0.006011 0.469557 0.253896 0.253896
2000-01-06 0.830231 0.184197 -0.390506 0.009748 0.009748
2000-01-07 0.047451 0.187778 -1.624492 -0.324640 -0.324640
2000-01-08 1.040963 0.395440 -1.259306 -0.784694 -0.784694
A rm shift rm-5-shift_first rm-5-mean_first
2000-04-02 -1.283123 -0.270381 0.226257 0.760370 0.760370
2000-04-03 1.369342 0.288072 2.367048 0.959912 0.959912
2000-04-04 0.003363 0.299997 1.143513 1.187941 1.187941
2000-04-05 0.694026 0.400442 NaN NaN NaN
2000-04-06 1.508863 0.458494 NaN NaN NaN
2000-04-07 0.226257 0.760370 NaN NaN NaN
2000-04-08 2.367048 0.959912 NaN NaN NaN
2000-04-09 1.143513 1.187941 NaN NaN NaN
For more see:
http://pandas.pydata.org/pandas-docs/stable/computation.html#moving-rolling-statistics-moments
http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.shift.html
Upvotes: 4