oceanbeach96
oceanbeach96

Reputation: 634

Export regression results as a csv file when using summary_out

I am running multiple regressions using financial data from Yahoo! Finance and Fama-French factors from http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/Data_Library/f-f_factors.html

Single factor regression:

CAPM = sm.ols( formula = 'Exret ~ MKT', data=m).fit(cov_type='HAC',cov_kwds={'maxlags':1})

Three factor regression:

FF3 = sm.ols( formula = 'Exret ~ MKT + SMB + HML',     
data=m).fit(cov_type='HAC',cov_kwds={'maxlags':1})

I then utilise summary_col to create a table with significance stars:

dfoutput = summary_col([CAPM,FF3],stars=True,float_format='%0.4f',
model_names=['GOOG','GOOG'],info_dict={'N':lambda x: "{0:d}".format(int(x.nobs)),'Adjusted R2':lambda x: "{:.2f}".format(x.rsquared_adj)}, regressor_order = ['Intercept', 'MKT', 'SMB', 'HML'])

Output

dfoutput
Out[311]: 
<class 'statsmodels.iolib.summary2.Summary'>
"""

=================================
             GOOG I       GOOG II  
---------------------------------
Intercept   -0.0009***   -0.0010***
            (0.0003)      (0.0003)  
MKT         0.0098***     0.0107*** 
            (0.0003)      (0.0003)  
SMB                      -0.0033***
                          (0.0006)  
HML                      -0.0063***
                          (0.0006)  
N              1930         1930      
Adjusted R2    0.37         0.42      
=================================
Standard errors in parentheses.
* p<.1, ** p<.05, ***p<.01

I have the following two questions:

  1. Is it possible to change the standard errors in parentheses to t-stats?

  2. Is it possible to export the results of the summary_col function to Excel as a csv file?

Upvotes: 1

Views: 8241

Answers (1)

user8682794
user8682794

Reputation:

It is possible to change the standard errors in parentheses to t-statistics but only if you modify the file summary2.py in the statsmodel library.

You just need to replace the function _col_params() in that file with the following version:

def _col_params(result, float_format='%.4f', stars=True):
    '''Stack coefficients and standard errors in single column
    '''

    # Extract parameters
    res = summary_params(result)
    # Format float
    for col in res.columns[:3]:
        res[col] = res[col].apply(lambda x: float_format % x)
    # Std.Errors in parentheses
    res.ix[:, 2] = '(' + res.ix[:, 2] + ')'
    # Significance stars
    if stars:
        idx = res.ix[:, 3] < .1
        res.ix[idx, 0] = res.ix[idx, 0] + '*'
        idx = res.ix[:, 3] < .05
        res.ix[idx, 0] = res.ix[idx, 0] + '*'
        idx = res.ix[:, 3] < .01
        res.ix[idx, 0] = res.ix[idx, 0] + '*'
    # Stack Coefs and Std.Errors
    res = res.ix[:, [0,2]]
    res = res.stack()
    res = pd.DataFrame(res)
    res.columns = [str(result.model.endog_names)]
    return res

It is a good idea to comment out the original function (instead of deleting it) in order to be able to switch back and forth as necessary.

You can then do something along the following lines:

import pandas as pd        
import statsmodels.api as sm
from statsmodels.iolib.summary2 import summary_col

df = pd.read_stata('http://www.stata-press.com/data/r14/auto.dta')
df['cons'] = 1

Y = df['mpg']
X1 = df[['weight', 'cons']]
X2 = df[['weight', 'price', 'cons']]
X3 = df[['weight', 'price', 'length', 'cons']]
X4 = df[['weight', 'price', 'length', 'displacement', 'cons']]

reg1 = sm.OLS(Y, X1).fit()
reg2 = sm.OLS(Y, X2).fit()
reg3 = sm.OLS(Y, X3).fit()
reg4 = sm.OLS(Y, X4).fit()

results = summary_col([reg1, reg2, reg3, reg4],stars=True,float_format='%0.2f',
                  model_names=['Model\n(1)', 'Model\n(2)', 'Model\n(3)',  'Model\n(4)'],
                  info_dict={'N':lambda x: "{0:d}".format(int(x.nobs)),
                             'R2':lambda x: "{:.2f}".format(x.rsquared)})

Evidently, the results now include the t-statistics instead of the standard errors:

print(results)

================================================
              Model    Model    Model    Model  
               (1)      (2)      (3)      (4)   
------------------------------------------------
cons         39.44*** 39.44*** 49.68*** 50.02***
             (24.44)  (24.32)  (7.85)   (7.80)  
displacement                            0.00    
                                        (0.44)  
length                         -0.10*   -0.09   
                               (-1.67)  (-1.63) 
price                 -0.00    -0.00    -0.00   
                      (-0.57)  (-1.03)  (-1.03) 
weight       -0.01*** -0.01*** -0.00*   -0.00*  
             (-11.60) (-9.42)  (-1.72)  (-1.67) 
N            74       74       74       74      
R2           0.65     0.65     0.67     0.67    
================================================
Standard errors in parentheses.
* p<.1, ** p<.05, ***p<.01

Note that you will manually have to change the text beneath the table to reflect the change, unless you are prepared to make further changes to the source code.

Once this step is complete, you can simply use the as_text() method to get everything in a string and export this to a text file using the csv library:

results_text = results.as_text()

import csv
resultFile = open("table.csv",'w')
resultFile.write(results_text)
resultFile.close()

Upvotes: 4

Related Questions