euri10
euri10

Reputation: 2635

Pandas rolling on a shifted dataframe

Here's a piece of code, I don't get why on the last column rm-5, I get NaN for the first 4 items.

I understand that for the rm columns the 1st 4 items aren't filled because there is no data available, but if I shift the column calculation should be made, shouldn't it ?

Similarly I don't get why there are 5 and not 4 items in the rm-5 column that are NaN

import pandas as pd
import numpy as np

index = pd.date_range('2000-1-1', periods=100, freq='D')
df = pd.DataFrame(data=np.random.randn(100), index=index, columns=['A'])

df['rm']=pd.rolling_mean(df['A'],5)
df['rm-5']=pd.rolling_mean(df['A'].shift(-5),5)

print df.head(n=8)
print df.tail(n=8)

                   A        rm      rm-5
2000-01-01  0.109161       NaN       NaN
2000-01-02 -0.360286       NaN       NaN
2000-01-03 -0.092439       NaN       NaN
2000-01-04  0.169439       NaN       NaN
2000-01-05  0.185829  0.002341  0.091736
2000-01-06  0.432599  0.067028  0.295949
2000-01-07 -0.374317  0.064222  0.055903
2000-01-08  1.258054  0.334321 -0.132972
                   A        rm      rm-5
2000-04-02  0.499860 -0.422931 -0.140111
2000-04-03 -0.868718 -0.458962 -0.182373
2000-04-04  0.081059 -0.443494 -0.040646
2000-04-05  0.500275 -0.093048       NaN
2000-04-06 -0.253915 -0.008288       NaN
2000-04-07 -0.159256 -0.140111       NaN
2000-04-08 -1.080027 -0.182373       NaN
2000-04-09  0.789690 -0.040646       NaN

Upvotes: 4

Views: 14952

Answers (1)

Hennep
Hennep

Reputation: 1702

You can change the order of operations. Now you are first shifting and afterwards taking the mean. Due to your first shift you create your NaN's at the end.

index = pd.date_range('2000-1-1', periods=100, freq='D')
df = pd.DataFrame(data=np.random.randn(100), index=index, columns=['A'])

df['rm']=pd.rolling_mean(df['A'],5)
df['shift'] = df['A'].shift(-5)
df['rm-5-shift_first']=pd.rolling_mean(df['A'].shift(-5),5)
df['rm-5-mean_first']=pd.rolling_mean(df['A'],5).shift(-5)

print( df.head(n=8))
print( df.tail(n=8))

                   A        rm     shift  rm-5-shift_first  rm-5-mean_first
2000-01-01 -0.120808       NaN  0.830231               NaN         0.184197
2000-01-02  0.029547       NaN  0.047451               NaN         0.187778
2000-01-03  0.002652       NaN  1.040963               NaN         0.395440
2000-01-04 -1.078656       NaN -1.118723               NaN         0.387426
2000-01-05  1.137210 -0.006011  0.469557          0.253896         0.253896
2000-01-06  0.830231  0.184197 -0.390506          0.009748         0.009748
2000-01-07  0.047451  0.187778 -1.624492         -0.324640        -0.324640
2000-01-08  1.040963  0.395440 -1.259306         -0.784694        -0.784694
                   A        rm     shift  rm-5-shift_first  rm-5-mean_first
2000-04-02 -1.283123 -0.270381  0.226257          0.760370         0.760370
2000-04-03  1.369342  0.288072  2.367048          0.959912         0.959912
2000-04-04  0.003363  0.299997  1.143513          1.187941         1.187941
2000-04-05  0.694026  0.400442       NaN               NaN              NaN
2000-04-06  1.508863  0.458494       NaN               NaN              NaN
2000-04-07  0.226257  0.760370       NaN               NaN              NaN
2000-04-08  2.367048  0.959912       NaN               NaN              NaN
2000-04-09  1.143513  1.187941       NaN               NaN              NaN

For more see:

http://pandas.pydata.org/pandas-docs/stable/computation.html#moving-rolling-statistics-moments

http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.shift.html

Upvotes: 4

Related Questions