Soph2010
Soph2010

Reputation: 613

How to convert list of regression outputs into data frames with broom::tidy by using lapply function?

I have a list with multiple data frames. Each data frame contains three columns (ColumnOne, ColumnTwo and ColumnThree).

list <- list(df1, df2, df3)

I am using lapply to run a regression on each data frame.

regression <- lapply(list, function (x) 
  lm(x$ColumnOne ~ x$ColumnTwo + x$ColumnThree))

When I display the output of regression, everything seems correct.

Now, I want to use broom::tidy to collect the regression outputs for each data frame in a table.

library(broom)
df <- lapply(regression, function(x)
  tidy(regression$x))
df

However, when I display df, it only shows empty (0x0) data frames.

Would appreciate any help!

Upvotes: 2

Views: 1142

Answers (2)

Curt F.
Curt F.

Reputation: 4824

I would recommend using the broom package in a slightly different way for applications like this. Here's how:

require(broom)

# simulate data
make_df <- function(){data.frame(ColumnOne = rnorm(5), 
                                 ColumnTwo=rnorm(5),
                                 ColumnThree=rnorm(5)
                                 )
                     }

my_list <- list(df1 = make_df(), 
                df2 = make_df(),
                df3=make_df()
                )

# bind the rows of the dataframe together and group by origin
my_list %>% 
     bind_rows(.id='df') %>% 
     group_by(df) %>% 
     do(tidy(lm(data=., 
                formula=ColumnOne ~ ColumnTwo + ColumnThree
                )
             )
        )

The result on the random toy data I made is a dataframe that looks like this:

 A tibble: 9 x 6
# Groups:   df [3]
  df    term        estimate std.error statistic p.value
  <chr> <chr>          <dbl>     <dbl>     <dbl>   <dbl>
1 df1   (Intercept)  -1.23      0.840     -1.47  0.280  
2 df1   ColumnTwo     0.944     0.573      1.65  0.241  
3 df1   ColumnThree  -0.532     0.486     -1.09  0.388  
4 df2   (Intercept)   0.942     0.718      1.31  0.320  
5 df2   ColumnTwo     0.900     1.02       0.885 0.470  
6 df2   ColumnThree  -0.0596    0.443     -0.135 0.905  
7 df3   (Intercept)   0.0453    0.0742     0.610 0.604  
8 df3   ColumnTwo     0.554     0.0509    10.9   0.00833
9 df3   ColumnThree  -0.229     0.114     -2.00  0.183  

Broom's design strategy is to use dataframes as much as possible. If you are starting with a list of dataframes that have the same columns, it's easier to combine them into one dataframe, and after that broom lets you work on it directly, instead of having to do functional programming on lists.

Upvotes: 0

John Colby
John Colby

Reputation: 22588

This is very compact with purrr.

First, simulate some data:

library(tidyverse)
library(broom)

df_list = map(1:3, ~ data.frame(matrix(sample.int(10, 30, replace = TRUE), ncol = 3)))

Then simply fit your models and sweep out the results:

> df_list %>% map( ~ tidy(lm(.)))
[[1]]
# A tibble: 3 x 5
  term        estimate std.error statistic p.value
  <chr>          <dbl>     <dbl>     <dbl>   <dbl>
1 (Intercept)   7.40       3.08     2.40    0.0474
2 X2            0.0309     0.341    0.0905  0.930
3 X3           -0.0387     0.358   -0.108   0.917

[[2]]
# A tibble: 3 x 5
  term        estimate std.error statistic p.value
  <chr>          <dbl>     <dbl>     <dbl>   <dbl>
1 (Intercept)   4.63       1.51      3.07   0.0181
2 X2            0.252      0.272     0.923  0.387
3 X3            0.0340     0.261     0.130  0.900

[[3]]
# A tibble: 3 x 5
  term        estimate std.error statistic p.value
  <chr>          <dbl>     <dbl>     <dbl>   <dbl>
1 (Intercept)   6.62       5.68      1.17    0.282
2 X2            0.0946     0.630     0.150   0.885
3 X3           -0.405      0.419    -0.967   0.366

Upvotes: 2

Related Questions