Reputation: 5424
Reading the documentation for do()
in dplyr, I've been impressed by the ability to create regression models for groups of data and was wondering whether it would be possible to replicate it using different independent variables rather than groups of data.
So far I've tried
require(dplyr)
data(mtcars)
models <- data.frame(var = c("cyl", "hp", "wt"))
models <- models %>% do(mod = lm(mpg ~ as.name(var), data = mtcars))
Error in as.vector(x, "symbol") :
cannot coerce type 'closure' to vector of type 'symbol'
models <- models %>% do(mod = lm(substitute(mpg ~ i, as.name(.$var)), data = mtcars))
Error in substitute(mpg ~ i, as.name(.$var)) :
invalid environment specified
The desired final output would be something like
var slope standard_error_slope
1 cyl -2.87 0.32
2 hp -0.07 0.01
3 wt -5.34 0.56
I'm aware that something similar is possible using a lapply approach, but find the apply family largely inscrutable. Is there a dplyr solution?
Upvotes: 3
Views: 669
Reputation: 193687
This isn't pure "dplyr", but rather, "dplyr" + "tidyr" + "data.table". Still, I think it should be pretty easily readable.
library(data.table)
library(dplyr)
library(tidyr)
mtcars %>%
gather(var, val, cyl:carb) %>%
as.data.table %>%
.[, as.list(summary(lm(mpg ~ val))$coefficients[2, 1:2]), by = var]
# var Estimate Std. Error
# 1: cyl -2.87579014 0.322408883
# 2: disp -0.04121512 0.004711833
# 3: hp -0.06822828 0.010119304
# 4: drat 7.67823260 1.506705108
# 5: wt -5.34447157 0.559101045
# 6: qsec 1.41212484 0.559210130
# 7: vs 7.94047619 1.632370025
# 8: am 7.24493927 1.764421632
# 9: gear 3.92333333 1.308130699
# 10: carb -2.05571870 0.568545640
If you really just wanted a few variables, start with a vector, not a data.frame
.
models <- c("cyl", "hp", "wt")
mtcars %>%
select_(.dots = c("mpg", models)) %>%
gather(var, val, -mpg) %>%
as.data.table %>%
.[, as.list(summary(lm(mpg ~ val))$coefficients[2, 1:2]), by = var]
# var Estimate Std. Error
# 1: cyl -2.87579014 0.3224089
# 2: hp -0.06822828 0.0101193
# 3: wt -5.34447157 0.5591010
Upvotes: 4
Reputation: 57697
There's nothing particularly complicated about the approach in the linked page. The use of substitute
and as.name
is a bit arcane, but that's easily rectified.
varlist <- names(mtcars)[-1]
models <- lapply(varlist, function(x) {
form <- formula(paste("mpg ~", x))
lm(form, data=mtcars)
})
dplyr is not the be-all and end-all of R programming. I'd suggest getting familiar with the *apply functions as they'll be of use in many situations where dplyr doesn't work.
Upvotes: 7