Reputation: 1499
Let's say I'd like to write anscombe %>% lm_tidy("x1", "y1")
(Actually, I'd like to write anscombe %>% lm_tidy(x1, y1)
, where x1
and y1
are part of the data frame). So, as the following function seems working:
plot_gg <- function(df, x, y) {
x <- enquo(x)
y <- enquo(y)
ggplot(df, aes(x = !!x, y = !!y)) + geom_point() +
geom_smooth(formula = y ~ x, method="lm", se = FALSE)
}
I started writing the following function:
lm_tidy_1 <- function(df, x, y) {
x <- enquo(x)
y <- enquo(y)
fm <- y ~ x ##### I tried many stuff here!
lm(fm, data=df)
}
## Error in model.frame.default(formula = fm, data = df, drop.unused.levels = TRUE) :
## object is not a matrix
One comment in passing in column name as argument states that embrace {{...}}
is a shorthand notation for the quote-unquote pattern. Unfortunately, error messages were different in both situations:
lm_tidy_2 <- function(df, x, y) {
fm <- !!enquo(y) ~ !!enquo(x) # alternative: {{y}} ~ {{x}} with different errors!!
lm(fm, data=df)
}
## Error:
## ! Quosures can only be unquoted within a quasiquotation context.
This seems working (based on @jubas's answer but we're stuck with string handling and paste
):
lm_tidy_str <- function(df, x, y) {
fm <- formula(paste({{y}}, "~", {{x}}))
lm(fm, data=df)
}
Yet again, {{y}} != !!enquo(y)
. But it's worse: the following function breaks down with the same Quosure
error as earlier:
lm_tidy_str_1 <- function(df, x, y) {
x <- enquo(x)
y <- enquo(y)
fm <- formula(paste(!!y, "~", !!x))
lm(fm, data=df)
}
{{y}} != !!enquo(y)
?lm
?EDIT: Sorry, there were left-overs from my many trials. I want to directly pass the data-variables (say x1
and y1
) to the function that is going to use them as formula components (such as lm
) and not their string versions ("x1"
and "y1"
): I try to avoid strings as long as possible and it's more streamlined from the user perspective.
Upvotes: 4
Views: 1264
Reputation: 77
Here's what I use:
fm <- as.formula(paste0(y, ' ~ ', x))
lm(fm, data=df)
See:
?as.formula
Upvotes: 1
Reputation: 948
Wrap the formula in "expr," then evaluate it.
library(dplyr)
lm_tidy <- function(df, x, y) {
x <- sym(x)
y <- sym(y)
fm <- expr(!!y ~ !!x)
lm(fm, data = df)
}
This function is equivalent:
lm_tidy <- function(df, x, y) {
fm <- expr(!!sym(y) ~ !!sym(x))
lm(fm, data = df)
}
Then
lm_tidy(mtcars, "cyl", "mpg")
gives
Call:
lm(formula = fm, data = .)
Coefficients:
(Intercept) cyl
37.885 -2.876
EDIT per comment below:
library(rlang)
lm_tidy_quo <- function(df, x, y){
y <- enquo(y)
x <- enquo(x)
fm <- paste(quo_text(y), "~", quo_text(x))
lm(fm, data = df)
}
You can then pass symbols as arguments
lm_tidy_quo(mtcars, cyl, mpg)
Upvotes: 1
Reputation: 79208
Consider:
lm_tidy_1 <- function(df, x, y) {
fm <- reformulate(as.character(substitute(x)), substitute(y))
lm(fm, data=df)
}
lm_tidy_1(iris, Species, Sepal.Length)
lm_tidy_1(iris, 'Species', Sepal.Length)
lm_tidy_1(iris, Species, 'Sepal.Length')
lm_tidy_1(iris, 'Species', 'Sepal.Length')
Edit:
If you need the formula to appear, change the call object:
lm_tidy_1 <- function(df, x, y) {
fm <- reformulate(as.character(substitute(x)), substitute(y))
res<-lm(fm, data=df)
res$call[[2]]<- fm
res
}
lm_tidy_1(iris, Species, Sepal.Length)
Call:
lm(formula = Sepal.Length ~ Species, data = df)
Coefficients:
(Intercept) Speciesversicolor Speciesvirginica
5.006 0.930 1.582
Upvotes: 5
Reputation: 9865
@BiranSzydek's answer is pretty good. However it has 3 downsides:
Call:
lm(formula = fm, data = .)
rlang
- though it is a great package.You can indeed solve this problem with pure base R!
R is actually under-the-hood a Lisp. It is suitable for such meta-programming tasks. The only downside of R is its horrible syntax. Especially when facing meta-programming, it is not as beautiful and as elegant like the Lisp languages. The syntax really can confuse a lot - as you experienced it yourself when trying to solve this problem.
The solution is to use substitute()
by which you can substitute code pieces in a quoted manner:
lm_tidy <- function(df, x, y) {
# take the arguments as code pieces instead to evaluate them:
.x <- substitute(x)
.y <- substitute(y)
.df <- substitute(df)
# take the code piece `y ~ x` and substitute using list lookup table
.fm <- substitute(y ~ x, list(y=.y, x=.x))
# take the code `lm(fm, data=df)` and substitute with the code pieceses defined by the lookup table
# by replacing them by the code pieces stored in `.fm` and `.df`
# and finally: evaluate the substituted code in the parent environment (the environment where the function was called!)
eval.parent(substitute(lm(fm, data=df), list(fm=.fm, df=.df)))
}
The trick is to use eval.parent(substitute( <your expression>, <a list which determines the evaluation lookup-table for the variables in your expression>))
.
Beware of scoping! As long as <your expression>
is constructed only using variables which are defined inside the function or inside the lookup-list given to substitute()
, there won't be any scoping problems! But avoid to refer to any other variables within <your expression>
! - So this is the only rule you have to obey to use eval()
/eval.parent()
safely in this context!
but even if, the eval.parent()
takes care, that the substituted code
is executed within the environment where this function was called.
Now, you can do:
lm_tidy(mtcars, cyl, mpg)
the output is now as desired:
Call:
lm(formula = mpg ~ cyl, data = mtcars)
Coefficients:
(Intercept) cyl
37.885 -2.876
And we did this with pure base R!
The trick for safe use of eval()
is really that every variable in the substitute()
expression is defined/given inside the lookup tables for substitute()
or the function's argument. In other words: None of the replaced variables refers to any dangling variables outside the function definition.
plot_gg
functionSo following these rules, your plot_gg
function would be defined as:
plot_gg <- function(df, x, y) {
.x <- substitute(x)
.y <- substitute(y)
.df <- substitute(df)
.fm <- substitute( y ~ x, list(x=.x, y=.y))
eval.parent(substitute(
ggplot(df, aes(x=x, y=y)) + geom_point() +
geom_smooth(formula = fm, method="lm", se=FALSE),
list(fm=.fm, x=.x, y=.y, df=.df)
))
}
x
and y
as strings
lm_tidy_str <- function(df, x, y) {
.x <- as.name(x)
.y <- as.name(y)
.df <- substitute(df)
.fm <- substitute(y ~ x, list(y=.y, x=.x))
eval.parent(substitute(lm(fm, data=df), list(fm=.fm, df=.df)))
}
plot_gg_str <- function(df, x, y) {
.x <- as.name(x)
.y <- as.name(y)
.df <- substitute(df)
.fm <- substitute( y ~ x, list(x=.x, y=.y))
eval.parent(substitute(
ggplot(df, aes(x=x, y=y)) + geom_point() +
geom_smooth(formula = fm, method="lm", se=FALSE),
list(fm=.fm, x=.x, y=.y, df=.df)
))
}
lm_tidy_str(mtcars, "cyl", "mpg")
# Call:
# lm(formula = mpg ~ cyl, data = mtcars)
#
# Coefficients:
# (Intercept) cyl
# 37.885 -2.876
#
require(ggplot2)
plot_gg_str(mtcars, "cyl", "mpg")
Upvotes: 6