Sarah
Sarah

Reputation: 413

Rolling mean of Time series Pandas

I am trying to learn about rolling statistics. I created a data frame for :

d = date_range('1/1/2011', periods=72, freq='H')
s = Series(randn(len(rng)), index=rng)

as :

import numpy as np
from numpy.random import randn
import time
r = date_range('1/1/2011', periods=72, freq='H')
r
len(r)
[r[i] for i in range(len(r))]
s = Series(randn(len(r)), index=r)
s
s.plot()
df_new = DataFrame(data = s, columns=['Random Number Generated'])
df_new.diff().hist()

Now I am trying to find the rolling mean of the series over the last 3 hours in a new column on a DataFrame. I tried to find the rolling mean first:

df_new['mean'] = rolling_mean(df_new, window=3)

Am I correct ? But the result doesn't look like mean. Can someone explain me this one please.

Upvotes: 1

Views: 1945

Answers (2)

Alexander
Alexander

Reputation: 109520

As long as your index is a timestamp (as it currently is), you can just use resample:

s.resample('3H')

When you use random numbers, it is best to set a seed value so that others can replicate your results.

np.random.seed(0)
s = pd.Series(np.random.randn(72), pd.date_range('1/1/2011', periods=72, freq='H'))
s.plot();s.resample('3H').plot()

enter image description here

Upvotes: 1

sy2
sy2

Reputation: 51

I have rerun your code and could not find any problems. It seems to work. If you want to take the rolling mean over the last 3 hours, rolling_mean(df_new, window=5) should be rolling_mean(df_new, window=3)

Here is my code for the verification.

import numpy as np

window = 3
mean_list = []
val_list = []
for i, val in enumerate(s):
    val_list.append(val)
    if i < window - 1:
        mean_list.append(np.nan)
    else:
        mean_list.append(np.mean(np.array(val_list)))
        val_list.pop(0)
df_new['mean2'] = mean_list
print(df_new)

Output:

                     Random Number Generated      mean     mean2
2011-01-01 00:00:00                 1.457483       NaN       NaN
2011-01-01 01:00:00                 0.009979       NaN       NaN
2011-01-01 02:00:00                 0.581128  0.682864  0.682864
2011-01-01 03:00:00                 1.905528  0.832212  0.832212
2011-01-01 04:00:00                 2.221040  1.569232  1.569232
2011-01-01 05:00:00                 0.696211  1.607593  1.607593
2011-01-01 06:00:00                -0.854759  0.687497  0.687497
2011-01-01 07:00:00                -0.033226 -0.063925 -0.063925
2011-01-01 08:00:00                 0.097187 -0.263599 -0.263599
2011-01-01 09:00:00                -1.579210 -0.505083 -0.505083
...

The results by rolling_mean is consistent with manually calculated rolling mean values.

Another way to confirm the validity is looking at the plots of calculated rolling mean. pandas.DataFrame prepares plot method to draw graph easily.

 from matplotlib import pyplot 
 df_new.plot()
 pyplot.show()

enter image description here

Upvotes: 1

Related Questions