Reputation: 527
I want to run a series of linear regressions for multiple groups across columns. For the group stratification across rows, I can use the idea suggested here (Fitting several regression models with dplyr). In addition to that, I also need to regress them across different columns. See below the code I achieved with the loop. I wonder whether I can do both in a vectorized manner using the map
function in package purrr together with the function of group_by
in dplyr package and export the estimated beta coefficients and p values accordingly.
library(dplyr)
library(broom)
head(mtcars)
vec<-names(mtcars)[3:9]
data=NULL
for (i in 1:length(vec)){
df<-mtcars%>%
group_by(cyl)%>%
do( fit = lm( paste('mpg ~disp+',vec[i]), data = .))
dfCoef = tidy(df, fit)
res<-dfCoef %>%
filter(term=='disp')
res$con=vec[i]
data=bind_rows(data,res)
}
data
Upvotes: 1
Views: 213
Reputation: 124268
Using tidyr::(un)nest
to perform the regressions by groups and a helper function this could be achieved like so:
library(dplyr)
library(broom)
library(tidyr)
library(purrr)
vec <- names(mtcars)[3:9]
lm_help <- function(vec) {
mtcars %>%
tidyr::nest(data = -cyl) %>%
mutate(con = vec,
fit = purrr::map(data, lm, formula = as.formula(paste0("mpg ~ disp + ", vec))),
tidy = purrr::map(fit, tidy)) %>%
select(cyl, con, tidy) %>%
tidyr::unnest(tidy) %>%
filter(term == "disp")
}
purrr::map(vec, lm_help) %>%
bind_rows()
#> # A tibble: 21 x 7
#> cyl con term estimate std.error statistic p.value
#> <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 6 disp disp 0.00361 0.0156 0.232 0.826
#> 2 4 disp disp -0.135 0.0332 -4.07 0.00278
#> 3 8 disp disp -0.0196 0.00932 -2.11 0.0568
#> 4 6 hp disp 0.00180 0.0202 0.0890 0.933
#> 5 4 hp disp -0.120 0.0369 -3.24 0.0120
#> 6 8 hp disp -0.0186 0.00946 -1.97 0.0746
#> 7 6 drat disp 0.0224 0.0292 0.770 0.484
#> 8 4 drat disp -0.133 0.0406 -3.27 0.0114
#> 9 8 drat disp -0.0196 0.00977 -2.01 0.0697
#> 10 6 wt disp 0.0191 0.0109 1.75 0.154
#> # ... with 11 more rows
Upvotes: 2