Dan Ghara
Dan Ghara

Reputation: 27

Regression analysis,using statsmodels

Please help me for getting output from this code.why the output of this code is nan?!!!whats my wrong?

import numpy as np
import statsmodels.api as sm
import statsmodels.formula.api as smf
import pandas as pd
import matplotlib.pyplot as plt
import math
import datetime as dt
#importing Data
es_url = 'https://www.stoxx.com/document/Indices/Current/HistoricalData/hbrbcpe.txt'
vs_url = 'https://www.stoxx.com/document/Indices/Current/HistoricalData/h_vstoxx.txt'
#creating DataFrame
cols=['SX5P','SX5E','SXXP','SXXE','SXXF','SXXA','DK5f','DKXF']
es=pd.read_csv(es_url,index_col=0,parse_dates=True,sep=';',dayfirst=True,header=None,skiprows=4,names=cols)
vs=pd.read_csv(vs_url,index_col=0,header=2,parse_dates=True,sep=',',dayfirst=True)
data=pd.DataFrame({'EUROSTOXX' : es['SX5E'][es.index > dt.datetime(1999,1,1)]},dtype=float)
data=data.join(pd.DataFrame({'VSTOXX' : vs['V2TX'][vs.index > dt.datetime(1999,1,1)]},dtype=float))
data=data.fillna(method='ffill')
rets=(((data/data.shift(1))-1)*100).round(2)
xdat = rets['EUROSTOXX']
ydat = rets['VSTOXX']
#regression analysis
model = smf.ols('ydat ~ xdat',data=rets).fit()
print model.summary()

Upvotes: 0

Views: 325

Answers (1)

Jarad
Jarad

Reputation: 18883

The problem is, when you compute rets, you divide by zero which causes an inf. Also, when you use shift, you have NaNs so you have missing values that need to be handled in some way first before proceeding to the regression.

Walk through this example using your data and see:

df = data.loc['2016-03-20':'2016-04-01'].copy()

df looks like:

            EUROSTOXX   VSTOXX
2016-03-21    3048.77  35.6846
2016-03-22    3051.23  35.6846
2016-03-23    3042.42  35.6846
2016-03-24    2986.73  35.6846
2016-03-25       0.00  35.6846
2016-03-28       0.00  35.6846
2016-03-29    3004.87  35.6846
2016-03-30    3044.10  35.6846
2016-03-31    3004.93  35.6846
2016-04-01    2953.28  35.6846

Shifting by 1 and dividing:

df = (((df/df.shift(1))-1)*100).round(2)

Prints out:

             EUROSTOXX  VSTOXX
2016-03-21         NaN     NaN
2016-03-22    0.080688     0.0
2016-03-23   -0.288736     0.0
2016-03-24   -1.830451     0.0
2016-03-25 -100.000000     0.0
2016-03-28         NaN     0.0
2016-03-29         inf     0.0
2016-03-30    1.305547     0.0
2016-03-31   -1.286751     0.0
2016-04-01   -1.718842     0.0

Take-aways: shifting by 1 automatically always creates a NaN at the top. Dividing 0.00 by 0.00 produces an inf.

One possible solution to handle missing values:

...
xdat = rets['EUROSTOXX']
ydat = rets['VSTOXX']

# handle missing values
messed_up_indices = xdat[xdat.isin([-np.inf, np.inf, np.nan]) == True].index
xdat[messed_up_indices] = xdat[messed_up_indices].replace([-np.inf, np.inf], np.nan)
xdat[messed_up_indices] = xdat[messed_up_indices].fillna(xdat.mean())
ydat[messed_up_indices] = ydat[messed_up_indices].fillna(0.0)

#regression analysis
model = smf.ols('ydat ~ xdat',data=rets, missing='raise').fit()
print(model.summary())

Notice I added the missing='raise' parameter to ols to see what's going on.

End result prints out:

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                   ydat   R-squared:                       0.259
Model:                            OLS   Adj. R-squared:                  0.259
Method:                 Least Squares   F-statistic:                     1593.
Date:                Wed, 03 Jan 2018   Prob (F-statistic):          5.76e-299
Time:                        12:01:14   Log-Likelihood:                -13856.
No. Observations:                4554   AIC:                         2.772e+04
Df Residuals:                    4552   BIC:                         2.773e+04
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.1608      0.075      2.139      0.033       0.013       0.308
xdat          -1.4209      0.036    -39.912      0.000      -1.491      -1.351
==============================================================================
Omnibus:                     4280.114   Durbin-Watson:                   2.074
Prob(Omnibus):                  0.000   Jarque-Bera (JB):          4021394.925
Skew:                          -3.446   Prob(JB):                         0.00
Kurtosis:                     148.415   Cond. No.                         2.11
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Upvotes: 2

Related Questions