Reputation: 4889

using column names in functions that rely on quasiquotation

I am writing a custom function that is expected to work with both unquoted and "quoted" inputs. I can implement it using rlang. But it doesn't seem to work when "quoted" arguments are provided using colnames.

Any ideas on how this can be resolved?

library(tidyverse)

# function
cor_foo <- function(data, x1, x2) {
  x1 <- rlang::ensym(x1)
  x2 <- rlang::ensym(x2)

  df <- dplyr::select(data, {{x1}}, {{x2}})

  cor(df %>% dplyr::pull({{x1}}), df %>% dplyr::pull({{x2}}))
}

# works
cor_foo(mtcars, wt, mpg)
#> [1] -0.8676594

# works
cor_foo(mtcars, "wt", "mpg")
#> [1] -0.8676594

# checking strings that will be passed to the function as arguments
colnames(mtcars)[1]
#> [1] "mpg"
colnames(mtcars)[6]
#> [1] "wt"

# doesn't work with these inputs
cor_foo(mtcars, colnames(mtcars)[6], colnames(mtcars)[1])
#> Error: Only strings can be converted to symbols

^{Created on 2019-11-12 by the reprex package (v0.3.0)}

Upvotes: 0

Answers (2)

Artem Sokolov

Reputation: 13721

You're trying to mix standard and non-standard evaluation, which almost always results in ambiguous behavior. Consider the following variant of the data:

X <- mtcars %>% mutate(`colnames(mtcars)[6]` = 1:n(), `colnames(mtcars)[1]` = 1:n())

What should your function return in this case?

cor_foo(X, colnames(mtcars)[6], colnames(mtcars)[1])

If arguments 2 and 3 are interpreted with standard evaluation (SE), then they should be resolved to strings "mpg" and "wt" before being passed down to cor_foo. On the other hand, if arguments 2 and 3 are meant to follow non-standard evaluation (NSE), then they should be treated as unevaluated expressions that already contain column names.

My suggestion is to commit to either SE or NSE. rlang::ensym() bridges the two a little bit by working with both strings and symbols. However, it doesn't work with arbitrary expressions because it's ambiguous whether these expressions already contain the column name or need to be evaluated to obtain the column name.

A solution that likely gives you the desired behavior is to drop ensym() in lieu of enquo(). Note that {{.}} is shorthand of !!enquo(.), so you can simply drop the ensym lines:

cor_foo <- function(data, x1, x2) {
  df <- dplyr::select(data, {{x1}}, {{x2}})
  cor(df %>% dplyr::pull({{x1}}), df %>% dplyr::pull({{x2}}))
}

cor_foo(X, "mpg", "wt")
# [1] -0.8676594
cor_foo(X, mpg, wt)
# [1] -0.8676594
cor_foo(X, colnames(mtcars)[6], colnames(mtcars)[1])
# [1] -0.8676594
cor_foo(X, `colnames(mtcars)[6]`, `colnames(mtcars)[1]`)
# [1] 1

Note that this is a commitment to the NSE interpretation, and the user must use !! to force in-place evaluation of expressions:

cyl <- colnames(mtcars)[1]     # Effectively cyl <- "mpg"
cor_foo(X, cyl, wt)
# [1] 0.7824958
cor_foo(X, !!cyl, wt)
# [1] -0.8676594

Upvotes: 1

caldwellst

Reputation: 5956

You want to use enquo here. ensym doesn't capture the quoting environment, and in fact tries to turn colnames(mtcars)[6] and colnames(mtcars)[1] into symbols themselves, which is generating the error since those are not strings.

If we use enquo, we capture the quoting environment and turn that into a quosore to be evaluated. You can use this to just check what each are doing:

cor_sym <- function(data, x1) {
  x1 <- rlang::ensym(x1)
  x1
}

cor_sym(mtcars, colnames(mtcars)[6])

# Run traceback on the error

cor_quo <- function(data, x1) {
  x1 <- rlang::enquo(x1)
  x1
}

cor_quo(mtcars, colnames(mtcars)[6])

You will see that cor_quo is returning a quosure and returns the environment as global. So if we use enquo instead of ensym, the quosore is evaluated and provides string value for the select and pull calls.

cor_foo <- function(data, x1, x2) {
  x1 <- rlang::enquo(x1)
  x2 <- rlang::enquo(x2)
  df <- dplyr::select(data, {{x1}}, {{x2}})
  cor(df %>% dplyr::pull({{x1}}), df %>% dplyr::pull({{x2}}))
}

cor_foo(mtcars, colnames(mtcars)[6], colnames(mtcars)[1])

You can find smarter people who understand this better than myself explaining the differences here: What is the difference between ensym and enquo when programming with dplyr?

Upvotes: 1

using column names in functions that rely on quasiquotation

Answers (2)

Related Questions