Reputation: 830
I'd like to generate a series that's the incremental mean of a timeseries. Meaning that, starting from the first date (index 0), the mean stored in row x is the average of values [0:x]
data
index value mean formula
0 4
1 5
2 6
3 7 5.5 average(0-3)
4 4 5.2 average(0-4)
5 5 5.166666667 average(0-5)
6 6 5.285714286 average(0-6)
7 7 5.5 average(0-7)
I'm hoping there's a way to do this without looping to take advantage of pandas.
Upvotes: 35
Views: 31754
Reputation: 3711
Here's an update for newer versions of Pandas (starting with 0.18.0)
df['value'].expanding().mean()
or
s.expanding().mean()
Upvotes: 67
Reputation: 5878
Another approach is to use cumsum(), and divide by the cumulative number of items, for example:
In [1]:
s = pd.Series([4, 5, 6, 7, 4, 5, 6, 7])
s.cumsum() / pd.Series(np.arange(1, len(s)+1), s.index)
Out[1]:
0 4.000000
1 4.500000
2 5.000000
3 5.500000
4 5.200000
5 5.166667
6 5.285714
7 5.500000
dtype: float64
Upvotes: 12
Reputation: 375735
As @TomAugspurger points out, you can use expanding_mean
:
In [11]: s = pd.Series([4, 5, 6, 7, 4, 5, 6, 7])
In [12]: pd.expanding_mean(s, 4)
Out[12]:
0 NaN
1 NaN
2 NaN
3 5.500000
4 5.200000
5 5.166667
6 5.285714
7 5.500000
dtype: float64
Upvotes: 17