Reputation: 795
I am looking for a way to construct a function that will allow the user, given a data frame, to:
Ideally, I would like something like this:
df <- mtcars
myreg <- function(dv, control) {
mod1 <- lm(dv ~ control + mpg, data = df)
mod2 <- lm(dv ~ control + wt, data = df)
mod3 <- lm(dv ~ control + wt + mpg, data = df)
return(list("reg1" = mod1, "reg2" = mod2, "reg3" = mod3))
}
My first guess was that providing strings with the names of the columns would work. But R
throws an error. It tells me that the length of variables do not match. Also, not passing strings and just names won't work either.
I tried some solution based on the get
function. However, it does not parse the control correctly and, honestly, I don't understand exactly how it works yet.
What's the correct way of implementing these choices in a function?
Upvotes: 3
Views: 1018
Reputation: 1486
What you are trying to do is to create a wrapper function for regression. Ideally you would program this so that the function can take in an arbitrary dataset and a specified response variable, control variable and other model terms, and then produce all the models of interest.
Usually we want the user to enter variable names without quotes, and we extract the name of this input as an "unevaluated expression" using deparse(substitute(...))
. For the other model terms in the regression, since there can be arbitrarily many of them, a reasonable input would be a list of character vectors. When you create a wrapper function it will generally construct an "internal call" for the models of interest, which may differ from the standard call for these models. Consequently, you will usually also want to amend the call for the models so that they look like the standard call. So, using this syntax, you would write something like this:
myreg <- function(response, control, other.terms = NULL, data) {
#Get variable names and other terms
DATA.NAME <- deparse(substitute(data))
RESPONSE.NAME <- deparse(substitute(response))
CONTROL.NAME <- deparse(substitute(control))
if (is.null(other.terms)) {
OTHER <- vector(mode = 'list', length = 1) } else {
OTHER <- other.terms }
#Set formula and model objects
m <- length(OTHER)
FORMULAE <- vector(mode = 'list', length = m)
MODELS <- vector(mode = 'list', length = m)
names(MODELS) <- sprintf('MODEL%s', 1:m)
FORM <- paste(RESPONSE.NAME, '~', CONTROL.NAME)
#Fit models
for (i in 1:m) {
#Set the formula
if (is.null(OTHER[[i]])) {
FORMULAE[[i]] <- FORM } else {
FORMULAE[[i]] <- paste(FORM, '+', paste(OTHER[[i]], collapse = '+')) }
#Fit the model and substitute the call
MODELS[[i]] <- lm(formula(FORMULAE[[i]]), data = data)
CALL <- paste0('lm(formula = ', FORMULAE[[i]], ', data = ', DATA.NAME, ')')
MODELS[[i]]$call <- parse(text = CALL)[[1]] }
#Return the models
MODELS }
You can then use the function to produce a list of multiple regression models with the specified variables. Here is an example where you produce three different models, each with the same response and control variable, but with different additional terms in the models:
(MODELS <- myreg(response = hp,
control = cyl,
other.terms = list('mpg', 'wt', c('mpg', 'wt')),
data = mtcars))
$MODEL1
Call:
lm(formula = hp ~ cyl + mpg, data = mtcars)
Coefficients:
(Intercept) cyl mpg
54.067 23.979 -2.775
$MODEL2
Call:
lm(formula = hp ~ cyl + wt, data = mtcars)
Coefficients:
(Intercept) cyl wt
-51.81 31.39 1.33
$MODEL3
Call:
lm(formula = hp ~ cyl + mpg + wt, data = mtcars)
Coefficients:
(Intercept) cyl mpg wt
115.66 25.03 -4.22 -12.13
Upvotes: 1
Reputation: 223
You should work with string substitution. The snippet below provides a simple overview of how you could adjust your function. It would also be good practice to pass the dataset df
as an additional parameter in your function.
df <- mtcars
## these would be function inputs
dv <- "mpg"
control <- "cyl"
## this would form the function body
tmpl <- "dv ~ control" # create a template formula
tmpl.dv <- gsub("dv", dv, tmpl) # insert the dv
tmpl.dv.control <- gsub("control", control, tmpl.dv) # insert the control
form <- as.formula(tmpl.dv.control) # create the formula to use in lm
## fit the model
mod <- lm(form, data = df)
Upvotes: 1