msh855
msh855

Reputation: 1571

Generate statistical tables in Python and export to Excel

I want to generate in Python high quality statistical tables used for publications.

In Stata, one can use the community-contributed family of commands estout:

sysuse auto, clear

regress mpg weight
estimates store A

regress mpg weight price 
estimates store B

regress mpg weight price length
estimates store C

regress mpg weight price length displacement
estimates store D

esttab A B C D, se r2 nonumber mtitle("Model 1" "Model 2" "Model 3" "Model 4")

----------------------------------------------------------------------------
                  Model 1         Model 2         Model 3         Model 4   
----------------------------------------------------------------------------
weight           -0.00601***     -0.00582***     -0.00304        -0.00354   
               (0.000518)      (0.000618)       (0.00177)       (0.00212)   

price                          -0.0000935       -0.000173       -0.000174   
                               (0.000163)      (0.000168)      (0.000169)   

length                                            -0.0966         -0.0947   
                                                 (0.0577)        (0.0582)   

displacement                                                      0.00433   
                                                                (0.00983)   

_cons               39.44***        39.44***        49.68***        50.02***
                  (1.614)         (1.622)         (6.329)         (6.410)   
----------------------------------------------------------------------------
N                      74              74              74              74   
R-sq                0.652           0.653           0.666           0.667   
----------------------------------------------------------------------------
Standard errors in parentheses
* p<0.05, ** p<0.01, *** p<0.001

How can I run multiple regressions in Python and summarise the information into some nice tables?

I would also like to be export these in Excel.

Upvotes: 4

Views: 2850

Answers (2)

user23991978
user23991978

Reputation: 1

Here is a revised version of the solution by @user8682794 as a function, with a loop and saving the output as an Excel file:

import statsmodels.api as sm
from statsmodels.iolib.summary2 import summary_col

def write_to_table(df, y_var, X_list, file_name):
   df['cons'] = 1
    
   Y = df[y_var]
   
   X = []
   reg = []
   names = []
   for i in range(0,X_list.__len__()):
       X_list[i].append('cons')
       X1 = df[X_list[i]].fillna(0)
       reg1 = sm.OLS(Y, X1).fit()
       X.append(X1)
       reg.append(reg1)
       names.append(f'Model\n({i+1})')
   
    
   results = summary_col(reg,stars=True,float_format='%0.2f',
                      model_names= names,
                      info_dict={'N':lambda x: "{0:d}".format(int(x.nobs)),
                                 'R2':lambda x: "{:.2f}".format(x.rsquared)})
    
   results.tables[0].to_excel(file_name)

Upvotes: 0

user8682794
user8682794

Reputation:

You can use the summary_col() function from statsmodels:

import pandas as pd        
import statsmodels.api as sm
from statsmodels.iolib.summary2 import summary_col

df = pd.read_stata('http://www.stata-press.com/data/r14/auto.dta')
df['cons'] = 1

Y = df['mpg']
X1 = df[['weight', 'cons']]
X2 = df[['weight', 'price', 'cons']]
X3 = df[['weight', 'price', 'length', 'cons']]
X4 = df[['weight', 'price', 'length', 'displacement', 'cons']]

reg1 = sm.OLS(Y, X1).fit()
reg2 = sm.OLS(Y, X2).fit()
reg3 = sm.OLS(Y, X3).fit()
reg4 = sm.OLS(Y, X4).fit()

results = summary_col([reg1, reg2, reg3, reg4],stars=True,float_format='%0.2f',
                  model_names=['Model\n(1)', 'Model\n(2)', 'Model\n(3)',  'Model\n(4)'],
                  info_dict={'N':lambda x: "{0:d}".format(int(x.nobs)),
                             'R2':lambda x: "{:.2f}".format(x.rsquared)})

The above code snippet will produce the following:

print(results)

================================================
              Model    Model    Model    Model  
               (1)      (2)      (3)      (4)   
------------------------------------------------
cons         39.44*** 39.44*** 49.68*** 50.02***
             (1.61)   (1.62)   (6.33)   (6.41)  
displacement                            0.00    
                                        (0.01)  
length                         -0.10*   -0.09   
                               (0.06)   (0.06)  
price                 -0.00    -0.00    -0.00   
                      (0.00)   (0.00)   (0.00)  
weight       -0.01*** -0.01*** -0.00*   -0.00*  
             (0.00)   (0.00)   (0.00)   (0.00)  
N            74       74       74       74      
R2           0.65     0.65     0.67     0.67    
================================================
Standard errors in parentheses.
* p<.1, ** p<.05, ***p<.01

Then you simply export:

results_text = results.as_text()

import csv
resultFile = open("table.csv",'w')
resultFile.write(results_text)
resultFile.close()

Upvotes: 5

Related Questions