Reputation: 165
Using the data set mtcars
as an example: The goal is to write a function to run multiple regression models with changing independent variables and changing dependent variables.
In the code that I wrote (below), var
are the independent variables and mpg
is the independent variable. I used map
to run regressions repeatedly with vs
and am
as the changing independent variables each time.
var = c("vs", "am")
mtcars %>% select(all_of(var)) %>%
map(~ glm(mpg ~ .x + cyl + disp + splines::ns(wt, 2) + hp,
family = gaussian(link = "identity"),
data = mtcars)) %>%
map_dfr(tidy, conf.int = T, .id = 'source') %>%
select(source, source, term, estimate, std.error, conf.low, conf.high, p.value)
I would like to run the same regression with a different set of independent variables, and also with a y that I can specify (e.g., I ran with mpg
above, and I would like to change it to qsec
or some other variables). So I envision a function like this:
function_name <- function(x, y, dataset){
dataset %>% select(all_of(x)) %>%
map(~ glm(y ~ .x + cyl + disp + splines::ns(wt, 2) + hp,
family = gaussian(link = "identity"),
data = dataset)) %>%
map_dfr(tidy, conf.int = T, .id = 'source') %>%
select(source, source, term, estimate, std.error, conf.low, conf.high, p.value)
}
But this function didn't work. Any suggestions?
Upvotes: 0
Views: 1089
Reputation: 124403
You could achieve your desired result like so:
y ~ ...
will not work. Instead you could use reformulate
(or as.formula
) to dynamically create the formula for your regression model.x
or more more precisely setNames(x, x)
instead of looping over dataset %>% select(all_of(x))
.library(dplyr)
library(purrr)
library(broom)
function_name <- function(x, y, dataset) {
map(setNames(x, x), ~ glm(reformulate(
termlabels = c(.x, "cyl", "disp", "splines::ns(wt, 2)", "hp"),
response = y
),
family = gaussian(link = "identity"),
data = dataset
)) %>%
map_dfr(tidy, conf.int = T, .id = "source") %>%
select(source, source, term, estimate, std.error, conf.low, conf.high, p.value)
}
var <- c("vs", "am")
function_name(x = var, y = "mpg", mtcars)
#> # A tibble: 14 × 7
#> source term estimate std.error conf.low conf.high p.value
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 vs (Intercept) 32.7 3.49 25.8 39.5 1.24e- 9
#> 2 vs vs 1.03 1.52 -1.95 4.01 5.05e- 1
#> 3 vs cyl -0.187 0.821 -1.80 1.42 8.21e- 1
#> 4 vs disp 0.000545 0.0119 -0.0228 0.0239 9.64e- 1
#> 5 vs splines::ns(wt, 2)1 -22.4 4.82 -31.9 -13.0 9.02e- 5
#> 6 vs splines::ns(wt, 2)2 -9.48 3.16 -15.7 -3.28 6.09e- 3
#> 7 vs hp -0.0202 0.0115 -0.0427 0.00226 9.02e- 2
#> 8 am (Intercept) 34.6 2.65 29.4 39.8 1.15e-12
#> 9 am am 0.0113 1.57 -3.06 3.08 9.94e- 1
#> 10 am cyl -0.470 0.714 -1.87 0.931 5.17e- 1
#> 11 am disp 0.000796 0.0125 -0.0236 0.0252 9.50e- 1
#> 12 am splines::ns(wt, 2)1 -21.5 5.86 -33.0 -10.0 1.14e- 3
#> 13 am splines::ns(wt, 2)2 -9.21 3.34 -15.8 -2.66 1.07e- 2
#> 14 am hp -0.0214 0.0136 -0.0480 0.00527 1.28e- 1
Upvotes: 1