zhiwei li
zhiwei li

Reputation: 1711

how to use non-standard evaluation in R

I want to use the variable name in a for loop but failed. I thought the problem is on the Non-Standard Evaluation in R.

I tried the method on this website: Standard evaluation and non-standard evaluation in R.

library(pROC)
library(tidyverse)
roc1 <- aSAH %>% roc(outcome, s100b)

plot(roc1)

a = 'roc1'

# method from Stack overflow
plot(!!rlang::sym(a))

# Some other attempts
plot(sym(a))

plot(!sym(a))

plot(!!sym(a))

Any help will be highly appreciated!!

Upvotes: 1

Views: 762

Answers (2)

Justin Landis
Justin Landis

Reputation: 2071

TLDR

Be careful with NSE, it is usually a design choice in favor of interactive use as opposed of programmatic use.

As others have pointed out, plot uses standard evaluation, and thus !! will not work directly (!! only works in quoting functions).

Other users have already given answers to your post, so I think I will focus on when to use NSE in code - because I don't think you've expressed well enough how you plan to use it or why.

But first some general info

Non standard evaluation is generally used to capture the end user's input in some way. A good portion of the tidyverse have functions that use NSE, specifically dplyr. These functions will 'quote' the users input, capturing an expression to be evaluated in another context. With dplyr, our first argument is always a data.frame and the rest of the arguments are usually quoted and are evaluated in the context of the data.frame (more specifically a data mask) before searching the rest of the environment.

Just a tangent before continuing, NSE didn't start with the tidyverse, because there are base R functions that use NSE. For example, subset() or anytime you use a formula.

Why use NSE

Typically, I find there are only two reason why I would code up a NSE function. I either want to extend a function that already uses NSE or I'm helping the end user with the DRY (do not repeat yourself) paradigm.

dplyr is a great package for interactive use, but the second you want to make a custom function that uses NSE (to mirror the dplyr style), you have to start working with quosures, which can be a huge pain at first.

Here is an example of a function that would use NSE.

I often find myself needing to insure that a certain keys only exists once in the data. The duplicated function helps determine which keys are problems, but I also want to return the rows that match those keys.

Consider the following reprex

library(dplyr)
#> Warning: package 'dplyr' was built under R version 3.6.3
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
df <- tibble(
  lower_case = sample(letters, 30, T),
  upper_case = sample(LETTERS, 30, T)
)

df
#> # A tibble: 30 x 2
#>    lower_case upper_case
#>    <chr>      <chr>     
#>  1 d          G         
#>  2 x          P         
#>  3 x          A         
#>  4 s          E         
#>  5 s          R         
#>  6 h          W         
#>  7 w          X         
#>  8 m          E         
#>  9 b          V         
#> 10 j          H         
#> # ... with 20 more rows

If I wanted to view all rows that match a duplicated value of the column lower_case I would write in dplyr:

attempt1 <- df %>% 
  filter(lower_case %in% subset(lower_case, duplicated(lower_case)))

But I find that kinda cumbersome, so could will write a function that will modify my argument to the correct call for me:

all_duplicates <- function(.data, .expr, verbose = F){
  #cature the expression as quosure
  .expr <- enquo(.expr)
  
  #the repetitive expression
  new_quo <- quo(!!.expr %in% subset(!!.expr,duplicated(!!.expr)))
  
  if(verbose) print(new_quo)
  
  filter(.data, !!new_quo) #since filter is a quoting function
                           #I can use !! here
  
}

Now I need to write very little to do that specific task

attempt2 <- df %>%
  all_duplicates(lower_case)

identical(attempt1, attempt2)
#> [1] TRUE

I included a debugging verbose argument as a sanity check to make sure the new quo expression was formulated correctly.


all_duplicates(df, lower_case, verbose = T)
#> <quosure>
#> expr: ^(^lower_case) %in% subset(^lower_case, duplicated(^lower_case))
#> env:  0000000015FB9768
#> # A tibble: 17 x 2
#>    lower_case upper_case
#>    <chr>      <chr>     
#>  1 x          P         
#>  2 x          A         
#>  3 s          E         
#>  4 s          R         
#>  5 h          W         
#>  6 b          V         
#>  7 n          W         
#>  8 b          O         
#>  9 l          D         
#> 10 e          G         
#> 11 n          H         
#> 12 r          H         
#> 13 n          T         
#> 14 h          S         
#> 15 l          G         
#> 16 e          A         
#> 17 r          B
all_duplicates(df, upper_case, verbose = T)
#> <quosure>
#> expr: ^(^upper_case) %in% subset(^upper_case, duplicated(^upper_case))
#> env:  000000001A6B5E60
#> # A tibble: 19 x 2
#>    lower_case upper_case
#>    <chr>      <chr>     
#>  1 d          G         
#>  2 x          P         
#>  3 x          A         
#>  4 s          E         
#>  5 h          W         
#>  6 m          E         
#>  7 j          H         
#>  8 n          W         
#>  9 b          O         
#> 10 u          J         
#> 11 e          G         
#> 12 n          H         
#> 13 r          H         
#> 14 p          O         
#> 15 c          P         
#> 16 z          G         
#> 17 l          G         
#> 18 a          J         
#> 19 e          A

Here's the kicker, lets say I have a list of data.frames and I want to apply this function on all of them, HOWEVER the column of interest for each data.frame is different.

To do this you would need to convert the string into a symbol with sym(), as you were doing.

This is the biggest downfall of NSE in that you need to have extra care if you plan to use them programmatically. Mores specifically, the END USER needs to remember how to use the function programmatically - which is not as intuitive as a standard evaluation function.

I'd be remised to mention that dplyr has made a lot of development in order to tackle this problem, but that doesn't mean these will be open for use in your functions (see across)

Here is a standard evaluation variant of the same thing, except it just expects any vector.

all_duplicates2 <- function(x) x %in% x[duplicated(x)]
#can still use it with dplyr
attempt3 <- df %>% filter(all_duplicates2(lower_case))

the programmatic example

df2 <- tibble(
    lower_case = sample(letters, 30, T),
    upper_case = sample(LETTERS, 30, T)
  )

l <- list(df, df2)
#applying function on two data frames, but different columns
standard_l <- purrr::map2(
   l, c("upper_case","lower_case"), 
   ~.x[all_duplicates2(.x[[.y]]),])
non_standard_l <- purrr::map2(
   l, c("upper_case","lower_case"),
   ~all_duplicates(.x, !!sym(.y))) #doable with NSE, but not many
                                   # people know this trick

identical(standard_l, non_standard_l)
#> [1] TRUE

Created on 2021-05-06 by the reprex package (v2.0.0)

Upvotes: 3

mt1022
mt1022

Reputation: 17289

The return of sym should be evaluated with eval or rlang::eval_tidy before they can be used in plot. For example:

a <- 1:10

x <- sym('a')

plot(eval(x))
plot(rlang::eval_tidy(x))

!! or !!! are forcing operators used to force evaluation in tidyverse functions.

Upvotes: 2

Related Questions