Armando Contestabile
Armando Contestabile

Reputation: 300

Pandas Dataframe rolling mean of last 50 daily values differs from rolling("50D").mean()

I'm trying to find how the "50D" rolling mean is being calculated in the following example because really I cannot find the way.

import pandas as pd

values = [np.nan, -0.00076194, -0.01189744, -0.00062106, -0.00534628, -0.00331957, -0.00337944, 0.00630714, -0.00330637, -0.0045095 , 0.00064805, -0.01268343, -0.00614887, 0.00275657, -0.00440909, 0.00590626, -0.02339615, -0.01969567, 0.01092916, 0.01058356, -0.00758719, 0.00478345, -0.0023437 , 0.00288798, 0.0101454 , -0.0121736 , 0.01344621, -0.01383092, 0.00110947, -0.0004417 , -0.00088272, 0.00809808, -0.00034199, -0.00580861, 0.00621907, -0.00041045, -0.00292563, 0.00129668, -0.00427033, 0.00053285, 0.0021681 , -0.01402614, -0.00388071, 0.00153033, 0.00027945, -0.00899464, -0.00452222, -0.01700942, -0.00115979, -0.01157867, -0.01504971, 0.00732653, -0.01370921, 0.00867434, 0.00237124, 0.02349419, -0.01682703, 0.01467014, 0.01087479, -0.00393254, 0.00534539, -0.00678344, -0.00013054, -0.00503799, -0.00854087, 0.00295728, 0.00931616, 0.01194195, -0.00606532, -0.01017215, -0.003598 , -0.01083424, -0.00556437, -0.00348464, 0.006992 , 0.00278279, 0.00735125, 0.00506792, -0.00582055, -0.00053721, -0.00148132, 0.00828239, 0.00497641, 0.00082642, 0.00834549, 0.01315036, 0.00898724, -0.00256485, -0.00329441, 0.00332798, 0.01377536, -0.00836893, -0.0047126 , -0.00444542, 0.00688868, 0.01143246, 0.00478997, 0.00350752, -0.01044042, -0.01597756]
index = pd.to_datetime(['2015-01-01', '2015-01-02', '2015-01-05', '2015-01-06', '2015-01-07', '2015-01-08', '2015-01-09', '2015-01-12', '2015-01-13', '2015-01-14', '2015-01-15', '2015-01-16', '2015-01-19', '2015-01-20', '2015-01-21', '2015-01-22', '2015-01-23', '2015-01-26', '2015-01-27', '2015-01-28', '2015-01-29', '2015-01-30', '2015-02-02', '2015-02-03', '2015-02-04', '2015-02-05', '2015-02-06', '2015-02-09', '2015-02-10', '2015-02-11', '2015-02-12', '2015-02-13', '2015-02-16', '2015-02-17', '2015-02-18', '2015-02-19', '2015-02-20', '2015-02-23', '2015-02-24', '2015-02-25', '2015-02-26', '2015-02-27', '2015-03-02', '2015-03-03', '2015-03-04', '2015-03-05', '2015-03-06', '2015-03-09', '2015-03-10', '2015-03-11', '2015-03-12', '2015-03-13', '2015-03-16', '2015-03-17', '2015-03-18', '2015-03-19', '2015-03-20', '2015-03-23', '2015-03-24', '2015-03-25', '2015-03-26', '2015-03-27', '2015-03-30', '2015-03-31', '2015-04-01', '2015-04-02', '2015-04-03', '2015-04-06', '2015-04-07', '2015-04-08', '2015-04-09', '2015-04-10', '2015-04-13', '2015-04-14', '2015-04-15', '2015-04-16', '2015-04-17', '2015-04-20', '2015-04-21', '2015-04-22', '2015-04-23', '2015-04-24', '2015-04-27', '2015-04-28', '2015-04-29', '2015-04-30', '2015-05-01', '2015-05-04', '2015-05-05', '2015-05-06', '2015-05-07', '2015-05-08', '2015-05-11', '2015-05-12', '2015-05-13', '2015-05-14', '2015-05-15', '2015-05-18', '2015-05-19', '2015-05-20'])
df = pd.DataFrame(data=values, index=index, columns=["returns"])
df["rolling_50"] = df["returns"].rolling(50).mean()
df["rolling_50D"] = df["returns"].rolling("50D").mean()
df.dropna()
    returns     rolling_50  rolling_50D
2015-03-12  -0.015050   -0.002742   -0.002511
2015-03-13  0.007327    -0.002580   -0.002472
2015-03-16  -0.013709   -0.002616   -0.002203
2015-03-17  0.008674    -0.002430   -0.001415
2015-03-18  0.002371    -0.002276   -0.001653
2015-03-19  0.023494    -0.001740   -0.001294
2015-03-20  -0.016827   -0.002009   -0.001551
2015-03-23  0.014670    -0.001841   -0.001276
2015-03-24  0.010875    -0.001558   -0.000909
2015-03-25  -0.003933   -0.001546   -0.001098
2015-03-26  0.005345    -0.001452   -0.001232
2015-03-27  -0.006783   -0.001334   -0.001082
2015-03-30  -0.000131   -0.001214   -0.001459
2015-03-31  -0.005038   -0.001370   -0.001215
2015-04-01  -0.008541   -0.001452   -0.001483
2015-04-02  0.002957    -0.001511   -0.001388
2015-04-03  0.009316    -0.000857   -0.001105
2015-04-06  0.011942    -0.000224   -0.000998
2015-04-07  -0.006065   -0.000564   -0.001157
2015-04-08  -0.010172   -0.000979   -0.001279
2015-04-09  -0.003598   -0.000900   -0.001551
2015-04-10  -0.010834   -0.001212   -0.001841
2015-04-13  -0.005564   -0.001276   -0.001914
2015-04-14  -0.003485   -0.001404   -0.002047
2015-04-15  0.006992    -0.001467   -0.001734
2015-04-16  0.002783    -0.001168   -0.001672
2015-04-17  0.007351    -0.001290   -0.001528
2015-04-20  0.005068    -0.000912   -0.000997
2015-04-21  -0.005821   -0.001050   -0.001051
2015-04-22  -0.000537   -0.001052   -0.001109
2015-04-23  -0.001481   -0.001064   -0.001157
2015-04-24  0.008282    -0.001060   -0.000678
2015-04-27  0.004976    -0.000954   -0.000414
2015-04-28  0.000826    -0.000821   0.000082
2015-04-29  0.008345    -0.000779   0.000346
2015-04-30  0.013150    -0.000508   0.001033
2015-05-01  0.008987    -0.000269   0.001700
2015-05-04  -0.002565   -0.000347   0.001426
2015-05-05  -0.003294   -0.000327   0.001715
2015-05-06  0.003328    -0.000271   0.001566
2015-05-07  0.013775    -0.000039   0.001883
2015-05-08  -0.008369   0.000074    0.000998
2015-05-11  -0.004713   0.000057    0.001335
2015-05-12  -0.004445   -0.000062   0.000804
2015-05-13  0.006889    0.000070    0.000693
2015-05-14  0.011432    0.000479    0.001120
2015-05-15  0.004790    0.000665    0.001104
2015-05-18  0.003508    0.001075    0.001390
2015-05-19  -0.010440   0.000890    0.001104
2015-05-20  -0.015978   0.000802    0.000800

As you can see, after 50 rows (preceding rows are np.nan in the column "rolling_50") the value of the column "rolling_50" is different than the value in "rolling_50D". Shouldn't both have been calculated in the same way (because each row is a day) ?

Upvotes: 0

Views: 25

Answers (0)

Related Questions