rentec
rentec

Reputation: 31

Python Pandas - Rolling regressions for multiple columns in a dataframe

I have a large dataframe containing daily timeseries of prices for 10,000 columns (stocks) over a period of 20 years (5000 rows x 10000 columns). Missing observations are indicated by NaNs.

            0      1      2      3      4      5       6      7      8      \
31.12.2009  30.75  66.99    NaN    NaN    NaN    NaN  393.87  57.04    NaN   
01.01.2010  30.75  66.99    NaN    NaN    NaN    NaN  393.87  57.04    NaN   
04.01.2010  31.85  66.99    NaN    NaN    NaN    NaN  404.93  57.04    NaN   
05.01.2010  33.26  66.99    NaN    NaN    NaN    NaN  400.00  58.75    NaN   
06.01.2010  33.26  66.99    NaN    NaN    NaN    NaN  400.00  58.75    NaN   

Now I want to run a rolling regression for a 250 day window for each column over the whole sample period and save the coefficient in another dataframe

Iterating over the colums and rows using two for-loops isn't very efficient, so I tried this but getting the following error message

def regress(start, end):
    y = df_returns.iloc[start:end].values

    if np.isnan(y).any() == False:
        X = np.arange(len(y))
        X = sm.add_constant(X, has_constant="add")
        model = sm.OLS(y,X).fit()

        return model.params[1]

    else:
        return np.nan


regression_window = 250

for t in (regression_window, len(df_returns.index)):

    df_coef[t] = df_returns.apply(regress(t-regression_window, t), axis=1)
TypeError: ("'float' object is not callable", 'occurred at index 31.12.2009')

Upvotes: 1

Views: 2189

Answers (1)

Mayeul sgc
Mayeul sgc

Reputation: 2089

here is my version, using df.rolling() instead and iterating over the columns. I am not completely sure it is what you were looking for don't hesitate to comment

import statsmodels.regression.linear_model as sm
import statsmodels.tools.tools as sm2
df_returns =pd.DataFrame({'0':[30,30,31,32,32],'1':[60,60,60,60,60],'2':[np.NaN,np.NaN,np.NaN,np.NaN,np.NaN]})


def regress(X,Z):

    if np.isnan(X).any() == False:
        model = sm.OLS(X,Z).fit()        
        return model.params[1]

    else:
        return np.NaN


regression_window = 3
Z = np.arange(regression_window)
Z= sm2.add_constant(Z, has_constant="add")
df_coef=pd.DataFrame()
for col in df_returns.columns:
    df_coef[col]=df_returns[col].rolling(window=regression_window).apply(lambda col : regress(col, Z))
df_coef

Upvotes: 1

Related Questions