Reputation: 613
I have a list with multiple data frames. Each data frame contains three columns (ColumnOne, ColumnTwo and ColumnThree).
list <- list(df1, df2, df3)
I am using lapply to run a regression on each data frame.
regression <- lapply(list, function (x)
lm(x$ColumnOne ~ x$ColumnTwo + x$ColumnThree))
When I display the output of regression, everything seems correct.
Now, I want to use broom::tidy to collect the regression outputs for each data frame in a table.
library(broom)
df <- lapply(regression, function(x)
tidy(regression$x))
df
However, when I display df, it only shows empty (0x0) data frames.
Would appreciate any help!
Upvotes: 2
Views: 1142
Reputation: 4824
I would recommend using the broom package in a slightly different way for applications like this. Here's how:
require(broom)
# simulate data
make_df <- function(){data.frame(ColumnOne = rnorm(5),
ColumnTwo=rnorm(5),
ColumnThree=rnorm(5)
)
}
my_list <- list(df1 = make_df(),
df2 = make_df(),
df3=make_df()
)
# bind the rows of the dataframe together and group by origin
my_list %>%
bind_rows(.id='df') %>%
group_by(df) %>%
do(tidy(lm(data=.,
formula=ColumnOne ~ ColumnTwo + ColumnThree
)
)
)
The result on the random toy data I made is a dataframe that looks like this:
A tibble: 9 x 6
# Groups: df [3]
df term estimate std.error statistic p.value
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 df1 (Intercept) -1.23 0.840 -1.47 0.280
2 df1 ColumnTwo 0.944 0.573 1.65 0.241
3 df1 ColumnThree -0.532 0.486 -1.09 0.388
4 df2 (Intercept) 0.942 0.718 1.31 0.320
5 df2 ColumnTwo 0.900 1.02 0.885 0.470
6 df2 ColumnThree -0.0596 0.443 -0.135 0.905
7 df3 (Intercept) 0.0453 0.0742 0.610 0.604
8 df3 ColumnTwo 0.554 0.0509 10.9 0.00833
9 df3 ColumnThree -0.229 0.114 -2.00 0.183
Broom's design strategy is to use dataframes as much as possible. If you are starting with a list of dataframes that have the same columns, it's easier to combine them into one dataframe, and after that broom lets you work on it directly, instead of having to do functional programming on lists.
Upvotes: 0
Reputation: 22588
This is very compact with purrr
.
First, simulate some data:
library(tidyverse)
library(broom)
df_list = map(1:3, ~ data.frame(matrix(sample.int(10, 30, replace = TRUE), ncol = 3)))
Then simply fit your models and sweep out the results:
> df_list %>% map( ~ tidy(lm(.)))
[[1]]
# A tibble: 3 x 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 7.40 3.08 2.40 0.0474
2 X2 0.0309 0.341 0.0905 0.930
3 X3 -0.0387 0.358 -0.108 0.917
[[2]]
# A tibble: 3 x 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 4.63 1.51 3.07 0.0181
2 X2 0.252 0.272 0.923 0.387
3 X3 0.0340 0.261 0.130 0.900
[[3]]
# A tibble: 3 x 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 6.62 5.68 1.17 0.282
2 X2 0.0946 0.630 0.150 0.885
3 X3 -0.405 0.419 -0.967 0.366
Upvotes: 2