Reputation: 669
I am having issues refactoring dplyr in a way that preserves non-standard evaluation. Lets say I want to create a function that always selects and renames.
library(lazyeval)
library(dplyr)
df <- data.frame(a = c(1,2,3), f = c(4,5,6), lm = c(7, 8 , 9))
select_happy<- function(df, col){
col <- lazy(col)
fo <- interp(~x, x=col)
select_(df, happy=fo)
}
f <- function(){
print('foo')
}
select_happy()
is written according to the answer to this post Refactor R code when library functions use non-standard evaluation. select_happy()
works on column names that are either undefined or defined in the global environment. However, it runs into issues when a column name is also the name of a function in another namespace.
select_happy(df, a)
# happy
# 1 1
# 2 2
# 3 3
select_happy(df, f)
# happy
# 1 4
# 2 5
# 3 6
select_happy(df, lm)
# Error in eval(expr, envir, enclos) (from #4) : object 'datafile' not found
environment(f)
# <environment: R_GlobalEnv>
environment(lm)
# <environment: namespace:stats>
Calling lazy()
on f and lm shows a difference in the lazy object, where the function definition for lm is appearing in the lazy object, and for f it is just the name of the function.
lazy(f)
# <lazy>
# expr: f
# env: <environment: R_GlobalEnv>
lazy(lm)
# <lazy>
# expr: function (formula, data, subset, weights, na.action, method = "qr", ...
# env: <environment: R_GlobalEnv>
substitute
appears to work with lm.
select_happy<- function(df, col){
col <- substitute(col) # <- substitute() instead of lazy()
fo <- interp(~x, x=col)
select_(df, happy=fo)
}
select_happy(df, lm)
# happy
# 1 7
# 2 8
# 3 9
However, after reading the vignette on lazyeval
it seems that lazy
should serve as a superior substitute for substitute
. Additionally, the regular select
function works just fine.
select(df, happy=lm)
# happy
# 1 7
# 2 8
# 3 9
My question is how can I write select_happy()
so that it works in all the ways that select()
does? I'm having a hard time wrapping my head around the scoping and non-standard evaluation. More generally, what would be a solid strategy for programming with dplyr that could avoid these and other issues?
Edit
I tested out docendo discimus's solution and it worked great, but I would like to know if there is a way to use arguments, rather than dots, for the function. I think it is also important to be able to use interp()
because you might want to feed input into a more complicated formula, like in the post I linked to earlier. I think the core of the issue come down to the fact that lazy_dots()
is capturing the expression differently from lazy()
. I would like to understand why they are behaving differently, and how to use lazy()
to get the same functionality as lazy_dots()
.
g <- function(...){
lazy_dots(...)
}
h <- function(x){
lazy(x)
}
g(lm)[[1]]
# <lazy>
# expr: lm
# env: <environment: R_GlobalEnv>
h(lm)
# <lazy>
# expr: function (formula, data, subset, weights, na.action, method = "qr", ...
# env: <environment: R_GlobalEnv>
Even changing .follow__symbols
to FALSE
for lazy()
so that it is the same as lazy_dots()
does not work.
lazy
# function (expr, env = parent.frame(), .follow_symbols = TRUE)
# {
# .Call(make_lazy, quote(expr), environment(), .follow_symbols)
# }
# <environment: namespace:lazyeval>
lazy_dots
# function (..., .follow_symbols = FALSE)
# {
# if (nargs() == 0)
# return(structure(list(), class = "lazy_dots"))
# .Call(make_lazy_dots, environment(), .follow_symbols)
# }
# <environment: namespace:lazyeval>
h2 <- function(x){
lazy(x, .follow_symbols=FALSE)
}
h2(lm)
# <lazy>
# expr: x
# env: <environment: 0xe4a42a8>
I just feel really kind of stuck as to what to do.
Upvotes: 9
Views: 3350
Reputation: 70266
One option may be to make write select_happy
almost the same way as the standard select
function:
select_happy<- function(df, ...){
select_(df, .dots = setNames(lazy_dots(...), "happy"))
}
f <- function(){
print('foo')
}
> select_happy(df, a)
happy
1 1
2 2
3 3
>
> select_happy(df, f)
happy
1 4
2 5
3 6
>
> select_happy(df, lm)
happy
1 7
2 8
3 9
Note that the function definition of the standard select
function is:
> select
function (.data, ...)
{
select_(.data, .dots = lazyeval::lazy_dots(...))
}
<environment: namespace:dplyr>
Also note that by this definition, select_happy
accepts multiple columns to be selected, but will name any additional columns "NA":
> select_happy(df, lm, a)
happy NA
1 7 1
2 8 2
3 9 3
Of course you could make some modifications for such cases, for example:
select_happy<- function(df, ...){
dots <- lazy_dots(...)
n <- length(dots)
if(n == 1) newnames <- "happy" else newnames <- paste0("happy", seq_len(n))
select_(df, .dots = setNames(dots, newnames))
}
> select_happy(df, f)
happy
1 4
2 5
3 6
> select_happy(df, lm, a)
happy1 happy2
1 7 1
2 8 2
3 9 3
Upvotes: 2