Reputation: 300
I'm trying to find how the "50D" rolling mean is being calculated in the following example because really I cannot find the way.
import pandas as pd
values = [np.nan, -0.00076194, -0.01189744, -0.00062106, -0.00534628, -0.00331957, -0.00337944, 0.00630714, -0.00330637, -0.0045095 , 0.00064805, -0.01268343, -0.00614887, 0.00275657, -0.00440909, 0.00590626, -0.02339615, -0.01969567, 0.01092916, 0.01058356, -0.00758719, 0.00478345, -0.0023437 , 0.00288798, 0.0101454 , -0.0121736 , 0.01344621, -0.01383092, 0.00110947, -0.0004417 , -0.00088272, 0.00809808, -0.00034199, -0.00580861, 0.00621907, -0.00041045, -0.00292563, 0.00129668, -0.00427033, 0.00053285, 0.0021681 , -0.01402614, -0.00388071, 0.00153033, 0.00027945, -0.00899464, -0.00452222, -0.01700942, -0.00115979, -0.01157867, -0.01504971, 0.00732653, -0.01370921, 0.00867434, 0.00237124, 0.02349419, -0.01682703, 0.01467014, 0.01087479, -0.00393254, 0.00534539, -0.00678344, -0.00013054, -0.00503799, -0.00854087, 0.00295728, 0.00931616, 0.01194195, -0.00606532, -0.01017215, -0.003598 , -0.01083424, -0.00556437, -0.00348464, 0.006992 , 0.00278279, 0.00735125, 0.00506792, -0.00582055, -0.00053721, -0.00148132, 0.00828239, 0.00497641, 0.00082642, 0.00834549, 0.01315036, 0.00898724, -0.00256485, -0.00329441, 0.00332798, 0.01377536, -0.00836893, -0.0047126 , -0.00444542, 0.00688868, 0.01143246, 0.00478997, 0.00350752, -0.01044042, -0.01597756]
index = pd.to_datetime(['2015-01-01', '2015-01-02', '2015-01-05', '2015-01-06', '2015-01-07', '2015-01-08', '2015-01-09', '2015-01-12', '2015-01-13', '2015-01-14', '2015-01-15', '2015-01-16', '2015-01-19', '2015-01-20', '2015-01-21', '2015-01-22', '2015-01-23', '2015-01-26', '2015-01-27', '2015-01-28', '2015-01-29', '2015-01-30', '2015-02-02', '2015-02-03', '2015-02-04', '2015-02-05', '2015-02-06', '2015-02-09', '2015-02-10', '2015-02-11', '2015-02-12', '2015-02-13', '2015-02-16', '2015-02-17', '2015-02-18', '2015-02-19', '2015-02-20', '2015-02-23', '2015-02-24', '2015-02-25', '2015-02-26', '2015-02-27', '2015-03-02', '2015-03-03', '2015-03-04', '2015-03-05', '2015-03-06', '2015-03-09', '2015-03-10', '2015-03-11', '2015-03-12', '2015-03-13', '2015-03-16', '2015-03-17', '2015-03-18', '2015-03-19', '2015-03-20', '2015-03-23', '2015-03-24', '2015-03-25', '2015-03-26', '2015-03-27', '2015-03-30', '2015-03-31', '2015-04-01', '2015-04-02', '2015-04-03', '2015-04-06', '2015-04-07', '2015-04-08', '2015-04-09', '2015-04-10', '2015-04-13', '2015-04-14', '2015-04-15', '2015-04-16', '2015-04-17', '2015-04-20', '2015-04-21', '2015-04-22', '2015-04-23', '2015-04-24', '2015-04-27', '2015-04-28', '2015-04-29', '2015-04-30', '2015-05-01', '2015-05-04', '2015-05-05', '2015-05-06', '2015-05-07', '2015-05-08', '2015-05-11', '2015-05-12', '2015-05-13', '2015-05-14', '2015-05-15', '2015-05-18', '2015-05-19', '2015-05-20'])
df = pd.DataFrame(data=values, index=index, columns=["returns"])
df["rolling_50"] = df["returns"].rolling(50).mean()
df["rolling_50D"] = df["returns"].rolling("50D").mean()
df.dropna()
returns rolling_50 rolling_50D
2015-03-12 -0.015050 -0.002742 -0.002511
2015-03-13 0.007327 -0.002580 -0.002472
2015-03-16 -0.013709 -0.002616 -0.002203
2015-03-17 0.008674 -0.002430 -0.001415
2015-03-18 0.002371 -0.002276 -0.001653
2015-03-19 0.023494 -0.001740 -0.001294
2015-03-20 -0.016827 -0.002009 -0.001551
2015-03-23 0.014670 -0.001841 -0.001276
2015-03-24 0.010875 -0.001558 -0.000909
2015-03-25 -0.003933 -0.001546 -0.001098
2015-03-26 0.005345 -0.001452 -0.001232
2015-03-27 -0.006783 -0.001334 -0.001082
2015-03-30 -0.000131 -0.001214 -0.001459
2015-03-31 -0.005038 -0.001370 -0.001215
2015-04-01 -0.008541 -0.001452 -0.001483
2015-04-02 0.002957 -0.001511 -0.001388
2015-04-03 0.009316 -0.000857 -0.001105
2015-04-06 0.011942 -0.000224 -0.000998
2015-04-07 -0.006065 -0.000564 -0.001157
2015-04-08 -0.010172 -0.000979 -0.001279
2015-04-09 -0.003598 -0.000900 -0.001551
2015-04-10 -0.010834 -0.001212 -0.001841
2015-04-13 -0.005564 -0.001276 -0.001914
2015-04-14 -0.003485 -0.001404 -0.002047
2015-04-15 0.006992 -0.001467 -0.001734
2015-04-16 0.002783 -0.001168 -0.001672
2015-04-17 0.007351 -0.001290 -0.001528
2015-04-20 0.005068 -0.000912 -0.000997
2015-04-21 -0.005821 -0.001050 -0.001051
2015-04-22 -0.000537 -0.001052 -0.001109
2015-04-23 -0.001481 -0.001064 -0.001157
2015-04-24 0.008282 -0.001060 -0.000678
2015-04-27 0.004976 -0.000954 -0.000414
2015-04-28 0.000826 -0.000821 0.000082
2015-04-29 0.008345 -0.000779 0.000346
2015-04-30 0.013150 -0.000508 0.001033
2015-05-01 0.008987 -0.000269 0.001700
2015-05-04 -0.002565 -0.000347 0.001426
2015-05-05 -0.003294 -0.000327 0.001715
2015-05-06 0.003328 -0.000271 0.001566
2015-05-07 0.013775 -0.000039 0.001883
2015-05-08 -0.008369 0.000074 0.000998
2015-05-11 -0.004713 0.000057 0.001335
2015-05-12 -0.004445 -0.000062 0.000804
2015-05-13 0.006889 0.000070 0.000693
2015-05-14 0.011432 0.000479 0.001120
2015-05-15 0.004790 0.000665 0.001104
2015-05-18 0.003508 0.001075 0.001390
2015-05-19 -0.010440 0.000890 0.001104
2015-05-20 -0.015978 0.000802 0.000800
As you can see, after 50 rows (preceding rows are np.nan in the column "rolling_50") the value of the column "rolling_50" is different than the value in "rolling_50D". Shouldn't both have been calculated in the same way (because each row is a day) ?
Upvotes: 0
Views: 25