tjebo
tjebo

Reputation: 23807

Use function arguments in lm formula within function environment

Consider the following function:

lm_eqn <- function(df, indep, dep){
  
  lm(formula = dep ~ indep, data = df)
}

lm_eqn(iris, Sepal.Length, Sepal.Width)  ## does not work, throws error. 

I tried to quote/unquote in several ways. None of those were succesful, throwing different errors and none of them were exactly helpful for me:

Using deparse(substitute(dep))

Error in contrasts<-(*tmp*, value = contr.funs[1 + isOF[nn]]) : contrasts can be applied only to factors with 2 or more levels

Using quo(dep) or enquo(dep) or expr(dep)

Error in model.frame.default(formula = dep ~ indep, data = df, drop.unused.levels = TRUE) : object is not a matrix

Using above with unquoting using !!:

Error in !dep : invalid argument type

Specifying the variable names for the formula within the function body works:

lm_eqn2 <- function(df){
  
     lm(formula = Sepal.Length ~ Sepal.Width, data = df)
}

lm_eqn2(iris)

# Call:
# lm(formula = Sepal.Length ~ Sepal.Width, data = df)

# Coefficients:
# (Intercept)  Sepal.Width  
#     6.5262      -0.2234 

What am I missing?

Upvotes: 2

Views: 367

Answers (4)

alistaire
alistaire

Reputation: 43354

If you want to keep the formula in the output pretty, you can call substitute on the whole call, which will interpolate the variable names, then call eval on the result to run it:

lm_eqn <- function(data, x, y){
    eval(substitute(
        lm(formula = y ~ x, data = data)
    ))
}

lm_eqn(iris, Sepal.Width, Sepal.Length)
#> 
#> Call:
#> lm(formula = Sepal.Length ~ Sepal.Width, data = iris)    # <- pretty!
#> 
#> Coefficients:
#> (Intercept)  Sepal.Width  
#>      6.5262      -0.2234

Or to make it all really simple (and a lot more flexible), just pass a formula directly:

lm_frm <- function(data, formula){
    lm(formula, data)
}

lm_frm(iris, Sepal.Length ~ Sepal.Width)
#> 
#> Call:
#> lm(formula = formula, data = data)
#> 
#> Coefficients:
#> (Intercept)  Sepal.Width  
#>      6.5262      -0.2234

Wrapping the lm call in eval(substitute(...)) will fix the stored call structure with this approach, too.

Upvotes: 4

Roman
Roman

Reputation: 4999

Approach without quotes:

> lm_eqn(iris, Sepal.Length, Sepal.Width)

Call:
lm(formula = dep ~ indep, data = df_lm)

Coefficients:
(Intercept)        indep  
    3.41895     -0.06188  

Caveat: Passing object names without quotes is visually pleasant, but generally frowned upon because it can introduce instability.

Code

lm_eqn <- function(df_lm, indep, dep){
    df_lm <- eval(as.name(deparse(substitute(df_lm))))
    indep <- df_lm[, grep(deparse(substitute(indep)), colnames(df_lm))]
    dep <- df_lm[, grep(deparse(substitute(dep)), colnames(df_lm))]

    lm(formula = dep ~ indep, data = df_lm)
}

Upvotes: 3

Rui Barradas
Rui Barradas

Reputation: 76663

You can use both quoted and unquoted column names with the following substitute trick taken from the source of function library, which also accepts both.

lm_eqn <- function(df, indep, dep){
  indep <- as.character(substitute(indep))
  dep <- as.character(substitute(dep))
  fmla <- as.formula(paste(dep, indep, sep = "~"))
  lm(fmla, data = df)
}

lm_eqn(iris, 'Sepal.Length', 'Sepal.Width')
#
#Call:
#lm(formula = fmla, data = df)
#
#Coefficients:
# (Intercept)  Sepal.Length  
#     3.41895      -0.06188  
#

lm_eqn(iris, Sepal.Length, Sepal.Width)
#
#Call:
#lm(formula = fmla, data = df)
#
#Coefficients:
# (Intercept)  Sepal.Length  
#     3.41895      -0.06188  
#

Upvotes: 3

bobbel
bobbel

Reputation: 2031

You can quote the input, and then use eval(as.name()) inside the function.

lm_eqn <- function(df, indep, dep){

  lm(formula = eval(as.name(dep)) ~ eval(as.name(indep)), data = df)
}

lm_eqn(iris, 'Sepal.Length', 'Sepal.Width')

Upvotes: 3

Related Questions