Michael
Michael

Reputation: 7397

pandas expanding apply for regression beta

Hi I'm trying to calculate regression betas for an expanding window in pandas. I have the following function to calculate beta

  def beta(row, col1, col2):
      return numpy.cov(row[col1],row[col2]) / numpy.var(row[col1])

And I've tried the following to get the expanding beta on my dataframe df

pandas.expanding_apply(df, beta, col1='col1', col2='col2')
pandas.expanding_apply(df, beta, kwargs={'col1':'col1', 'col2':'col2'})
df.expanding.apply(...)

However none of them work, I either get something that says the kwargs aren't getting passed through or if I hardcode the column names in the beta function I get

*** IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

Thanks

Example:

def beta(row, col1, col2):
    return numpy.cov(row[col1],row[col2]) / numpy.var(row[col1])
df = pandas.DataFrame({'a':[1,2,3,4,5],'b':[.1,5,.3,.5,6]})
pandas.expanding_apply(compute_df, beta, col1='a', col2='b')
pandas.expanding_apply(compute_df, beta, kwargs={'col1':'a', 'col2':'b'})

Both of those return errors

Upvotes: 3

Views: 2456

Answers (1)

Brad Solomon
Brad Solomon

Reputation: 40918

I've run into this issue when trying to calculate betas for rolling multiple regression, very similar to what you're doing (see here). The key issue is that with Expanding.apply(func, args=(), kwargs={}), the func param

Must produce a single value from an ndarray input *args and **kwargs are passed to the function

[source]

And there is really no way to accomodate using expanding.apply. (Note: as mentioned, expanding_apply is deprecated.)

Below is a workaround. It's more computationally expensive (will eat up memory) but will get you to your output. It creates a list of expanding-window NumPy arrays and then calculates a beta over each.

from pandas_datareader.data import DataReader as dr
import numpy as np
import pandas as pd

df = (dr(['GOOG', 'SPY'], 'google')['Close']
      .pct_change()
      .dropna())

# i is the asset, m is market/index
# [0, 1] grabs cov_i,j from the covar. matrix
def beta(i, m):
    return np.cov(i, m)[0, 1] / np.var(m)

def expwins(x, min_periods):
    return [x[:i] for i in range(min_periods, x.shape[0] + 1)]

# Example:
# arr = np.arange(10).reshape(5, 2)
# print(expwins(arr, min_periods=3)[1]) # the 2nd window of the set
# array([[0, 1],
       # [2, 3],
       # [4, 5],
       # [6, 7]])

min_periods = 21
# Create "blocks" of expanding windows
wins = expwins(df.values, min_periods=min_periods)
# Calculate a beta (single scalar val.) for each
betas = [beta(win[:, 0], win[:, 1]) for win in wins]
betas = pd.Series(betas, index=df.index[min_periods - 1:])

print(betas)
Date
2010-02-03    0.77572
2010-02-04    0.74769
2010-02-05    0.76692
2010-02-08    0.74301
2010-02-09    0.74741
2010-02-10    0.74635
2010-02-11    0.74735
2010-02-12    0.74605
2010-02-16    0.78521
2010-02-17    0.77619
2010-02-18    0.79188
2010-02-19    0.78952

2017-06-19    0.97387
2017-06-20    0.97390
2017-06-21    0.97386
2017-06-22    0.97387
2017-06-23    0.97391
2017-06-26    0.97389
2017-06-27    0.97482
2017-06-28    0.97508
2017-06-29    0.97594
2017-06-30    0.97584
2017-07-03    0.97575
2017-07-05    0.97588
dtype: float64

Upvotes: 2

Related Questions