Reputation: 7476
In order to have a more generic notation in my code, I want to express my original time series as a moving average over 1 period. Quite unexpectedly, using pandas pd.rolling_mean function, the two are not exactly the same:
import pandas as pd
import numpy as np
np.random.seed(1)
ts = pd.Series(np.random.rand(1000))
mavg = pd.rolling_mean(ts, 1)
(ts - mavg).describe()
Out[120]:
count 1.000000e+03
mean 6.284973e-16
std 3.877250e-16
min -3.330669e-16
25% 3.330669e-16
50% 5.551115e-16
75% 8.881784e-16
max 1.554312e-15
dtype: float64
any((ts - mavg).dropna()>0)
Out[121]: True
Should this be considered a bug or am I missing something?
Upvotes: 1
Views: 209
Reputation: 8831
The difference comes from the floating point calculations. Floats are not exactly the same when you do calculations due to the way how they are represented internally. Within these "rounding errors" your numbers are identical.
Upvotes: 0
Reputation: 85442
The numbers are very small and well in the range of numerical "noise" caused by how floats work. Floats cannot represent all numbers exactly. Therefore you will often have small "residuals" left when doing calculations with floats. Check against a small epsilon:
>>> any((ts - mavg).dropna().abs() > 1e-14)
False
Upvotes: 3