Daniel Yudkin
Daniel Yudkin

Reputation: 524

Sequentially store the results of a series of regressions into a dataframe

Suppose I want to run a series of regressions, like so:

summary(lm(mpg ~ cyl, data = mtcars))
summary(lm(mpg ~ disp, data = mtcars))
summary(lm(mpg ~ wt, data = mtcars))

I want to create a data frame that contains the estimates and standard errors of each of these outputs, preferably with the variable name included. So the ultimate output should look like this:

Variable  Beta  Coeff
cyl       -2.8  .32
disp      -.04  .004
wt        -5.3  .56

I presume it will require a function. Any ideas out there?

Upvotes: 2

Views: 154

Answers (2)

akrun
akrun

Reputation: 887691

One option would be loop through the columns of interest, paste to create a formula in lm, tidy the output, slice away the first row, and select the columns of interest

library(broom)
library(tidyverse)
map_df(c("cyl", "disp", "wt"), ~
      lm(paste0("mpg ~ ", .x), data = mtcars) %>% 
          tidy %>% 
          slice(-1) %>% 
          select(Variable = term, Beta = estimate, Coeff = std.error))
# A tibble: 3 x 3
#  Variable    Beta   Coeff
#  <chr>      <dbl>   <dbl>
#1 cyl      -2.88   0.322  
#2 disp     -0.0412 0.00471
#3 wt       -5.34   0.559  

Or using base R

t(sapply(c("cyl", "disp", "wt"), function(x) 
   summary(lm(paste0("mpg ~ ", x), data = mtcars))$coefficients[-1, 1:2]))

Upvotes: 1

MrFlick
MrFlick

Reputation: 206486

One easy way would be to use the purrr and broom packages in the tidyverse.

library(purrr)
library(broom)
cols <- c("cyl", "disp", "wt")

map_df(cols, ~lm(reformulate(.x, "mpg"), data=mtcars) %>% tidy())
#   term        estimate std.error statistic  p.value
#   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
# 1 (Intercept)  37.9      2.07        18.3  8.37e-18
# 2 cyl          -2.88     0.322       -8.92 6.11e-10
# 3 (Intercept)  29.6      1.23        24.1  3.58e-21
# 4 disp         -0.0412   0.00471     -8.75 9.38e-10
# 5 (Intercept)  37.3      1.88        19.9  8.24e-19
# 6 wt           -5.34     0.559       -9.56 1.29e-10

This gives you some extra info but you could easily filter it out with dplyr if you like.

Upvotes: 4

Related Questions