Reputation: 243
I'm trying to wrap my head around the different quo/unquo syntaxes and when each should be used.
I am mostly writing functions that pass a dataframe and columns to use as argument -- to plot using ggplot or summarize/manipulate data with dplyr (group_by, summarize, mutate ect). However, on occasion I also have to use a function that does not use NSE within my overall function.
From what I have read, my understanding is that:
1) if I'm referencing a column in a dataframe then I don't need to capture the environment and I can use ensym
or sym
. Is this correct? Would there be an issue using enquo
, or it is just not necessary?
2) if I use ensym
that the user could technically enter both a string or bare column name in the argument.
Based on this my typical function setup would look something like this:
library(tidyverse)
dataset <- mtcars
myfun <- function(dat, xvar, yvar, group){
#either manipulate data
x <- dat %>% group_by(!!ensym(group)) %>%
mutate(new = !!ensym(xvar)*5) %>%
summarize(medianx=median(!!ensym(xvar), na.rm=TRUE),
median_new=median(new, na.rm=TRUE))
#or plot data
p <- ggplot(dat, aes(x=!!ensym(xvar), y=!!ensym(yvar))) +
geom_point()
#sometime require referencing the column with NSE function..
median(dat[[xvar]]) #works if require string in argument
#how would you reference this with bare argument column? Convert ensym to string?
median(dat[[?????]])
}
#both work with ensym, only the later with sym
myfun(dataset, xvar=mpg, yvar=disp, group=cyl)
myfun(dataset, xvar="mpg", yvar="disp", group="cyl")
How would one convert the bare column argument or symbol to a string for use in the last line of myfun above? I tried rlang::as_string(!!ensym(xvar))
but it doesn't work.
Upvotes: 10
Views: 2412
Reputation: 13691
Your understanding is correct. sym
/ensym
is preferred when referencing a column in an existing data frame. enquo()
will, of course, work as well, but it captures any arbitrary expression, allowing the user to specify things like mpg * cyl
or log10(mpg + cyl)/2
. If your downstream code assumes that xvar
and yvar
are single columns, having arbitrary expressions can lead to problems or unexpected behavior. In that sense, ensym()
acts an argument verification step when you expect a reference to a single column.
As for converting symbols to strings, one approach is to use deparse()
:
median(dat[[deparse(ensym(xvar))]])
To get rlang::as_string
to work, you need to drop !!
, because you want to convert the expression itself to a string, not what the expression is referring to (e.g., mpg
, cyl
, etc.):
median(dat[[rlang::as_string(ensym(xvar))]])
Upvotes: 6