Aaron - mostly inactive
Aaron - mostly inactive

Reputation: 37754

How to use NSE in dplyr to refer to one variable?

I want to write a function for use in a dplyr chain to arrange grouped data by a given variable and then to check that that variable is strictly increasing integers (eg, 1,2,3,...). To clarify, I mean every integer in order, not just increasing integers. So 1,2,4,... should fail.

The idea would be to have something like this in the end, that would look like this, and provide an error if x was not 1,2,3,... for every group.

d %>% group_by(group) %>% check(x) 

I've written a SE version of this that seems to work, as follows, but am stuck on the NSE version.

check_ <- function(.data, var) {
  checkint <- function(x) { stopifnot(x == seq_along(x)) }
  do(.data, {
    . <- dplyr::arrange_(., var)
    checkint(lazyeval::lazy_eval(var, data=.))
    .
  })
}

In the documentation, it looks like I should be using lazy to process a single variable, but this doesn't work right when the variable I'm passing in also exists in the global environment.

checkX <- function(.data, var) {
  check_(.data, lazyeval::lazy(var))
}

d <- expand.grid(group=1:2, x=3:1)
x <- 5 ## put an "x" in the global environment
d %>% group_by(group) %>% checkX(x)

## Error: incorrect size (1), expecting : 3 

I do have a version of the NSE that seems to work, but calling lazy_dots feels wrong because I only ever want one variable.

check <- function(.data, ...) {
  check_(.data, lazyeval::lazy_dots(...)[[1]])
}

Upvotes: 2

Views: 115

Answers (1)

MrFlick
MrFlick

Reputation: 206207

Looks like lazyeval has been changing. The latest vignette doesn't even reference the lazy() function. It does seem to have problems with variables in scope (more on that at the bottom). There are now we functions being encouraged though they still haven't made their way into all of the "tidyverse" yet.

It looks like the function you want is expr_find. If we define checkX as

checkX <- function(.data, var) {
  check_(.data, lazyeval::expr_find(var))
}

Then this will work

x <- 5
d %>% group_by(group) %>% checkX(x)

(or at least it does with lazyeval_0.2.0 and dplyr_0.5.0)

But going back to the first example from the old vignette

library(lazyeval)
# `x` does not exist here
f <- function(x = a - b) {
  lazy(x)
}
f()
# <lazy>
#   expr: a - b
#   env:  <environment: 0x000000000663d618>
exists("x")
# [1] FALSE
f(x)
# <lazy>
#   expr: x
#   env:  <environment: R_GlobalEnv>
x <- 101
f(x)
# <lazy>
#   expr: 101
#   env:  <environment: R_GlobalEnv>

Or another even more simple example

# rm(x)
lazy(x)
# <lazy>
#   expr: x
#   env:  <environment: R_GlobalEnv>
x <- 100
lazy(x)
#  <lazy>
#   expr: 100
#   env:  <environment: R_GlobalEnv>

Somewhere its evaluating the parameter x so it's never being preserved in the lazy object if it exists in the environment it's coming from.

Upvotes: 2

Related Questions