Oden
Oden

Reputation: 23

summary dataframe from several multiple regression outputs

I am doing multiple OLS regressions. I have used the following lm function:

GroupNetReturnsStockPickers <- read.csv("GroupNetReturnsStockPickers.csv", header=TRUE, sep=",", dec=".")
ModelGroupNetReturnsStockPickers <- lm(StockPickersNet ~ Mkt.RF+SMB+HML+WML, data=GroupNetReturnsStockPickers)
names(GroupNetReturnsStockPickers)
summary(ModelGroupNetReturnsStockPickers)

Which gives me the summary output of:

    Call:
  lm(formula = StockPickersNet ~ Mkt.RF + SMB + HML + WML, data = GroupNetReturnsStockPickers)

Residuals:
  Min        1Q    Median        3Q       Max 
-0.029698 -0.005069 -0.000328  0.004546  0.041948 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)
(Intercept)  4.655e-05  5.981e-04   0.078    0.938
Mkt.RF      -1.713e-03  1.202e-02  -0.142    0.887
SMB          3.006e-02  2.545e-02   1.181    0.239
HML          1.970e-02  2.350e-02   0.838    0.403
WML          1.107e-02  1.444e-02   0.766    0.444

Residual standard error: 0.009029 on 251 degrees of freedom
Multiple R-squared:  0.01033,   Adjusted R-squared:  -0.005445 
F-statistic: 0.6548 on 4 and 251 DF,  p-value: 0.624

This is perfect. However, I am doing a total of 10 multiple OLS regressions, and I wish to create my own summary output, in a data frame, where I extract the Intercept Estimate, the tvalue estimate, and the p-value, for all 10 analyzes individually. Hence it would be a 10x3, where the columns names would be Model1, Model2,..,Model10, and row names: Value, t-value and p-Value.

I appreciate any help.

Upvotes: 2

Views: 5141

Answers (2)

Raad
Raad

Reputation: 2715

There's a few packages that do this (stargazer and texreg) as well as this code for outreg.

In any case, if you are only interested in the intercept here is one approach:

# Estimate a bunch of different models, stored in a list
fits <- list() # Create empty list to store models
fits$model1 <- lm(Ozone ~ Solar.R, data = airquality)
fits$model2 <- lm(Ozone ~ Solar.R + Wind, data = airquality)
fits$model3 <- lm(Ozone ~ Solar.R + Wind + Temp, data = airquality)

# Combine the results for the intercept
do.call(cbind, lapply(fits, function(z) summary(z)$coefficients["(Intercept)", ]))


# RESULT:
#                  model1       model2        model3
# Estimate   18.598727772 7.724604e+01 -64.342078929
# Std. Error  6.747904163 9.067507e+00  23.054724347
# t value     2.756222869 8.518995e+00  -2.790841389
# Pr(>|t|)    0.006856021 1.052118e-13   0.006226638

Upvotes: 2

coffeinjunky
coffeinjunky

Reputation: 11514

Look at the broom package, which was created to do exactly what you are asking for. The only difference is that it puts the models into rows and the different statistics into columns, and I understand that you would prefer the opposite, but you can work around that afterwards if it is really necessary.

To give you an example, the function tidy() converts a model output into a dataframe.

model <- lm(mpg ~ cyl, data=mtcars)
summary(model) 

Call:
lm(formula = mpg ~ cyl, data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.9814 -2.1185  0.2217  1.0717  7.5186 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  37.8846     2.0738   18.27  < 2e-16 ***
cyl          -2.8758     0.3224   -8.92 6.11e-10 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.206 on 30 degrees of freedom
Multiple R-squared:  0.7262,    Adjusted R-squared:  0.7171 
F-statistic: 79.56 on 1 and 30 DF,  p-value: 6.113e-10

And

 library(broom)
 tidy(model)

yields the following data frame:

         term estimate std.error statistic      p.value
1 (Intercept) 37.88458 2.0738436 18.267808 8.369155e-18
2         cyl -2.87579 0.3224089 -8.919699 6.112687e-10

Look at ?tidy.lm to see more options, for instance for confidence intervals, etc.

To combine the output of your ten models into one dataframe, you could use

library(dplyr)
bind_rows(one, two, three, ... , .id="models")

Or, if your different models come from regressions using the same dataframe, you can combine it with dplyr:

models <- mtcars %>% group_by(gear) %>% do(data.frame(tidy(lm(mpg~cyl, data=.), conf.int=T)))

Source: local data frame [6 x 8]
Groups: gear

  gear        term  estimate std.error statistic      p.value  conf.low  conf.high
1    3 (Intercept) 29.783784 4.5468925  6.550360 1.852532e-05 19.960820 39.6067478
2    3         cyl -1.831757 0.6018987 -3.043297 9.420695e-03 -3.132080 -0.5314336
3    4 (Intercept) 41.275000 5.9927925  6.887440 4.259099e-05 27.922226 54.6277739
4    4         cyl -3.587500 1.2587382 -2.850076 1.724783e-02 -6.392144 -0.7828565
5    5 (Intercept) 40.580000 3.3238331 12.208796 1.183209e-03 30.002080 51.1579205
6    5         cyl -3.200000 0.5308798 -6.027730 9.153118e-03 -4.889496 -1.5105036

Upvotes: 2

Related Questions