Reputation: 11
trying to create a function that will solve for the cumulative geometric mean for a vector or for columns of an array.
I can solve the geometric mean of the entire vector/column..simply need to do the following:
from scipy import stats
GM=stats.gmean(X)
print(GM)
When solving the cumulative arithmetic means, I can simply run pd.expanding_mean(x) to get the cumulative mean.
Is there a function I can run which would give me the same result for the geometric mean?
Upvotes: 1
Views: 1462
Reputation: 353199
If your series is pretty small, you can use expanding().apply
with the scipy.stats.gmean you're already using:
In [26]: s = pd.Series(range(1,10))
In [27]: s.expanding().apply(stats.gmean)
Out[27]:
0 1.000000
1 1.414214
2 1.817121
3 2.213364
4 2.605171
5 2.993795
6 3.380015
7 3.764351
8 4.147166
dtype: float64
But this will be very inefficient for longer series:
In [30]: %time egm = pd.concat([s]*1000).expanding().apply(stats.gmean)
CPU times: user 6.5 s, sys: 4 ms, total: 6.5 s
Wall time: 6.53 s
So you might want to make a custom function, something like
def expanding_gmean_log(s):
return np.exp(np.log(s).cumsum() / (np.arange(len(s))+1))
where we work in log-space in preference to something like s.cumprod() ** (1/(np.arange(len(s))+1))
to help avoid overflow in intermediate products.
In [52]: %timeit egm = expanding_gmean_log(pd.concat([s]*1000))
10 loops, best of 3: 71 ms per loop
In [53]: np.allclose(expanding_gmean_log(pd.concat([s]*1000)),
pd.concat([s]*1000).expanding().apply(stats.gmean))
Out[53]: True
Upvotes: 3
Reputation: 114841
You can use a vectorized implementation of the gmean formula. For example,
In [159]: x
Out[159]: array([10, 5, 12, 12, 2, 10])
In [160]: x.cumprod()**(1/np.arange(1., len(x)+1))
Out[160]:
array([ 10. , 7.07106781, 8.43432665, 9.2115587 ,
6.78691638, 7.23980855])
Here's the same result, using gmean()
and a list comprehension:
In [161]: np.array([gmean(x[:k]) for k in range(1, len(x)+1)])
Out[161]:
array([ 10. , 7.07106781, 8.43432665, 9.2115587 ,
6.78691638, 7.23980855])
If it is possible that x.cumprod()
will overflow, you can work with the logarithm of the gmean; see @DSM's answer.
Upvotes: 3