Stephen
Stephen

Reputation: 11

calculating cumulative geometric mean

trying to create a function that will solve for the cumulative geometric mean for a vector or for columns of an array.  

I can solve the geometric mean of the entire vector/column..simply need to do the following: 

from scipy import stats
GM=stats.gmean(X)
print(GM)

When solving the cumulative arithmetic means, I can simply run pd.expanding_mean(x) to get the cumulative mean.

Is there a function I can run which would give me the same result for the geometric mean?

Upvotes: 1

Views: 1462

Answers (2)

DSM
DSM

Reputation: 353199

If your series is pretty small, you can use expanding().apply with the scipy.stats.gmean you're already using:

In [26]: s = pd.Series(range(1,10))

In [27]: s.expanding().apply(stats.gmean)
Out[27]: 
0    1.000000
1    1.414214
2    1.817121
3    2.213364
4    2.605171
5    2.993795
6    3.380015
7    3.764351
8    4.147166
dtype: float64

But this will be very inefficient for longer series:

In [30]: %time egm = pd.concat([s]*1000).expanding().apply(stats.gmean)
CPU times: user 6.5 s, sys: 4 ms, total: 6.5 s
Wall time: 6.53 s

So you might want to make a custom function, something like

def expanding_gmean_log(s):
    return np.exp(np.log(s).cumsum() / (np.arange(len(s))+1))

where we work in log-space in preference to something like s.cumprod() ** (1/(np.arange(len(s))+1)) to help avoid overflow in intermediate products.

In [52]: %timeit egm = expanding_gmean_log(pd.concat([s]*1000))
10 loops, best of 3: 71 ms per loop

In [53]: np.allclose(expanding_gmean_log(pd.concat([s]*1000)),
                     pd.concat([s]*1000).expanding().apply(stats.gmean))
Out[53]: True

Upvotes: 3

Warren Weckesser
Warren Weckesser

Reputation: 114841

You can use a vectorized implementation of the gmean formula. For example,

In [159]: x
Out[159]: array([10,  5, 12, 12,  2, 10])

In [160]: x.cumprod()**(1/np.arange(1., len(x)+1))
Out[160]: 
array([ 10.        ,   7.07106781,   8.43432665,   9.2115587 ,
         6.78691638,   7.23980855])

Here's the same result, using gmean() and a list comprehension:

In [161]: np.array([gmean(x[:k]) for k in range(1, len(x)+1)])
Out[161]: 
array([ 10.        ,   7.07106781,   8.43432665,   9.2115587 ,
         6.78691638,   7.23980855])

If it is possible that x.cumprod() will overflow, you can work with the logarithm of the gmean; see @DSM's answer.

Upvotes: 3

Related Questions