Reputation: 4564
Consider the following R code:
y1 <- dataset %>% dplyr::filter(W == 1)
This works, but there seems to some magic here. Usually, when we have an expression like foo(bar)
, we should be able to do this:
baz <= bar
foo(baz)
However, in the presented code snippet, we cannot evaluate W == 1
outside of dplyr::filter()
! W
is not a defined variable.
What's going on?
Upvotes: 2
Views: 117
Reputation: 1038
dplyr uses a concept called Non-standard Evaluation (NSE) to make columns from the data frame argument accessible to its functions without quoting or using dataframe$column
syntax. Basically:
[Non-standard evaluation] is a catch-all term that means they don’t follow the usual R rules of evaluation. Instead, they capture the expression that you typed and evaluate it in a custom way.1
In this case, the custom evaluation takes the argument(s) given to dplyr::filter
, and parses them so that W
can be used to refer to the dataset$W
. The reason that you can't then take that variable and use it elsewhere is that NSE is only applied to the scope of the function.
NSE makes a trade-off: functions which modify scope are less safe and/or unusable in programming where you're building a program that uses functions to modify other functions:
This is an example of the general tension between functions that are designed for interactive use and functions that are safe to program with. A function that uses substitute() might reduce typing, but it can be difficult to call from another function.2
For example, if you wanted to write a function which would use the same code, but swap out W == 1
for W == 0
(or some completely different filter), NSE would make that more difficult to accomplish.
In 2017 the tidyverse started to build a solution to this in tidy evaluation.
Upvotes: 2