Reputation: 857
I am running multiple regressions by different groups. I want to automate things a bit more. I tried to run and save the regressions model1, model2 and model3 initially. Then i tried to shorten the code as the following:
temp <- df %>%
group_by(group) %>%
do(model1, model2, model3, data = .))) %>%
gather(model_name, model, -group) %>%
unnest()
But this did not work. Below you find the long version that works. Can someone guide me on how to make it a bit shorter?
df <- tibble(
a = rnorm(1000),
b = rnorm(1000),
c = rnorm(1000),
d = rnorm(1000),
group =sample.int(300,size=1000,replace=TRUE)-1)
)
df$group = as.factor(df$group)
temp1 <- df %>%
group_by(group) %>%
do(model2 = tidy(lm(a ~ b , data = .))) %>%
gather(model_name, model, -group) %>%
unnest()
temp2 <- df %>%
group_by(group) %>%
do(model2 = tidy(lm(a ~ c , data = .))) %>%
gather(model_name, model, -group) %>%
unnest()
temp3 <- df %>%
group_by(group) %>%
do(model3 = tidy(lm(a ~ d , data = .))) %>%
gather(model_name, model, -group) %>%
unnest()
Upvotes: 3
Views: 73
Reputation: 30474
This might work, using nest_by
and map
from purrr
.
Instead of group_by
try using nest_by
(dplyr
version 1.0.0) and run your model on each row of nested data. Using nest_by
will create a new temporary list column data
. It is similar to previously using nest
and rowwise
. The model needs to be in list as well here.
Using map
you can conduct models for each independent variable "b", "c", and "d". Independent variable is also included as separate column (could also be a label for the particular model).
library(tidyverse)
library(purrr)
library(broom)
df %>%
nest_by(group) %>%
mutate(model = list(map(c("b", "c", "d"), ~
cbind(independent = .x,
tidy(lm(formula(paste0("a ~ ", .x)), data = data)))))) %>%
summarise(bind_rows(model))
Output
group independent term estimate std.error statistic p.value
<fct> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 0 b (Intercept) 0.0480 NaN NaN NaN
2 0 b b 0.268 NaN NaN NaN
3 0 c (Intercept) -0.124 NaN NaN NaN
4 0 c c -0.447 NaN NaN NaN
5 0 d (Intercept) -0.107 NaN NaN NaN
6 0 d d 0.377 NaN NaN NaN
7 1 b (Intercept) 0.473 0.296 1.60 0.356
8 1 b b 0.383 0.261 1.47 0.380
9 1 c (Intercept) 0.547 0.544 1.01 0.498
10 1 c c -0.183 0.798 -0.229 0.857
Edit (12/19/20): In case you want to include models with multiple covariates and interaction terms, you could simply provide the formula in the string.
For example, if you want to run 3 models per group
:
You could do the following:
df %>%
nest_by(group) %>%
mutate(model = list(map(c("b + c + b*c", "c + d", "d"), ~
cbind(model = .x,
tidy(lm(formula(paste0("a ~ ", .x)), data = data)))))) %>%
summarise(bind_rows(model))
Output
group model term estimate std.error statistic p.value
<fct> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 0 b + c + b*c (Intercept) 0.718 0.281 2.56 0.0835
2 0 b + c + b*c b 0.819 0.348 2.35 0.100
3 0 b + c + b*c c -0.351 0.315 -1.11 0.346
4 0 b + c + b*c b:c 0.0444 0.461 0.0964 0.929
5 0 c + d (Intercept) 0.614 0.409 1.50 0.208
6 0 c + d c -0.271 0.439 -0.618 0.570
7 0 c + d d 0.182 0.487 0.374 0.727
8 0 d (Intercept) 0.605 0.383 1.58 0.175
9 0 d d 0.208 0.455 0.456 0.667
10 1 b + c + b*c (Intercept) 0.590 NaN NaN NaN
# … with 2,600 more rows
Or, if you want, you can list the equations completely and separately like this:
my_models <- c(
"a ~ b + c + b*c",
"a ~ c + d",
"a ~ d"
)
df %>%
nest_by(group) %>%
mutate(model = list(map(my_models, ~
cbind(model = .x,
tidy(lm(formula(.x), data = data)))))) %>%
summarise(bind_rows(model))
Output
group model term estimate std.error statistic p.value
<fct> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 0 a ~ b + c + b*c (Intercept) 0.718 0.281 2.56 0.0835
2 0 a ~ b + c + b*c b 0.819 0.348 2.35 0.100
3 0 a ~ b + c + b*c c -0.351 0.315 -1.11 0.346
4 0 a ~ b + c + b*c b:c 0.0444 0.461 0.0964 0.929
5 0 a ~ c + d (Intercept) 0.614 0.409 1.50 0.208
6 0 a ~ c + d c -0.271 0.439 -0.618 0.570
7 0 a ~ c + d d 0.182 0.487 0.374 0.727
8 0 a ~ d (Intercept) 0.605 0.383 1.58 0.175
9 0 a ~ d d 0.208 0.455 0.456 0.667
10 1 a ~ b + c + b*c (Intercept) 0.590 NaN NaN NaN
# … with 2,600 more rows
Data
set.seed(123)
df <- tibble(
a = rnorm(1000),
b = rnorm(1000),
c = rnorm(1000),
d = rnorm(1000),
group =sample.int(300,size=1000,replace=TRUE)-1)
)
df$group = as.factor(df$group)
Upvotes: 1