Reputation: 413
I am trying to learn about rolling statistics. I created a data frame for :
d = date_range('1/1/2011', periods=72, freq='H')
s = Series(randn(len(rng)), index=rng)
as :
import numpy as np
from numpy.random import randn
import time
r = date_range('1/1/2011', periods=72, freq='H')
r
len(r)
[r[i] for i in range(len(r))]
s = Series(randn(len(r)), index=r)
s
s.plot()
df_new = DataFrame(data = s, columns=['Random Number Generated'])
df_new.diff().hist()
Now I am trying to find the rolling mean of the series over the last 3 hours in a new column on a DataFrame. I tried to find the rolling mean first:
df_new['mean'] = rolling_mean(df_new, window=3)
Am I correct ? But the result doesn't look like mean. Can someone explain me this one please.
Upvotes: 1
Views: 1945
Reputation: 109520
As long as your index is a timestamp (as it currently is), you can just use resample:
s.resample('3H')
When you use random numbers, it is best to set a seed value so that others can replicate your results.
np.random.seed(0)
s = pd.Series(np.random.randn(72), pd.date_range('1/1/2011', periods=72, freq='H'))
s.plot();s.resample('3H').plot()
Upvotes: 1
Reputation: 51
I have rerun your code and could not find any problems. It seems to work.
If you want to take the rolling mean over the last 3 hours, rolling_mean(df_new, window=5)
should be rolling_mean(df_new, window=3)
Here is my code for the verification.
import numpy as np
window = 3
mean_list = []
val_list = []
for i, val in enumerate(s):
val_list.append(val)
if i < window - 1:
mean_list.append(np.nan)
else:
mean_list.append(np.mean(np.array(val_list)))
val_list.pop(0)
df_new['mean2'] = mean_list
print(df_new)
Output:
Random Number Generated mean mean2
2011-01-01 00:00:00 1.457483 NaN NaN
2011-01-01 01:00:00 0.009979 NaN NaN
2011-01-01 02:00:00 0.581128 0.682864 0.682864
2011-01-01 03:00:00 1.905528 0.832212 0.832212
2011-01-01 04:00:00 2.221040 1.569232 1.569232
2011-01-01 05:00:00 0.696211 1.607593 1.607593
2011-01-01 06:00:00 -0.854759 0.687497 0.687497
2011-01-01 07:00:00 -0.033226 -0.063925 -0.063925
2011-01-01 08:00:00 0.097187 -0.263599 -0.263599
2011-01-01 09:00:00 -1.579210 -0.505083 -0.505083
...
The results by rolling_mean
is consistent with manually calculated rolling mean values.
Another way to confirm the validity is looking at the plots of calculated rolling mean. pandas.DataFrame prepares plot
method to draw graph easily.
from matplotlib import pyplot
df_new.plot()
pyplot.show()
Upvotes: 1