user1357015
user1357015

Reputation: 11686

How does dplyr pass in non-string parameters

A lot of the time in dplyr, we do something like:

mydat %>% select(., mycol1, mycol2, mycol3)

However, mycol1, mycol2, and mycol3 are not strings but just text in R. How does the function know to convert it into a string.

For instance, if I were to do:

 dat <- data.frame(blue = rnorm(100), red= rnorm(100))
 mysum <- function(dat, x, y){
  browser()
  return (sum(dat$x)+ sum(dat$y))

 }

 mysum(dat, blue, red)

Upvotes: 1

Views: 127

Answers (1)

IRTFM
IRTFM

Reputation: 263332

Your function is always going to deliver 0 because the $ infix function uses non-standard evaluation of its right-hand side argument. (As you point out, non-standard evaluation is a favorite mechanism in @hadley's functions. For me it's a barrier, but for many people it seems to be a welcome strategy.) If you write your function in that manner (using $) you will generally fail to get what you want:

 mysum(dat, blue, red)
[1] 0   # Wrong answer

You said earlier that: "However, mycol1, mycol2, and mycol3 are not strings but just text in R." I guess you are trying to say that mycol is not enclosed in quotes and so is not a character literal. In R such "text" (a sequence of unquoted characters) is called a 'symbol' or a 'name'. (Up to this point we are not talking about anything to do with dplyr.) If you want to write a function that will deliver that sum, you would do so like this (avoiding the $ operation):

mysum <- function(dat, x, y){
  return (sum(dat[[x]])+ sum(dat[[y]]))
 }

 mysum(dat, 'blue', 'red')
[1] 19.16727

If you want to retrieve the argument name for a matched parameter you need to use the deparse( substitute(.))-maneuver:

 dat <- data.frame(blue = rnorm(10), red= rnorm(10))

 mysum2 <- function(dfrm, arg1, arg2){
     a1 <- deparse(substitute(arg1)); a2 <-  deparse(substitute(arg2))
      sum(dfrm[[a1]]) +sum(dfrm[[a2]]) }
 mysum2(dat, blue, red)
#[1] -0.5754979
 mysum(dat, "blue", "red")
#[1] -0.5754979

If you want to see how @hadley does, then it just type:

> dplyr::select
function (.data, ...) 
{
    select_(.data, .dots = lazyeval::lazy_dots(...))
}
<environment: namespace:dplyr>

.... doesn't really deliver the answer, does it? So we will need to try this:

 help(pac=lazyeval)

... which has an accompanying vignette named "lazyeval::lazyeval" --> "Lazyeval: a new approach to NSE". Hadley argues that his lazyeval functions are superior to the traditional substitute because they carry forward their environments, and suppose I do agree.

Upvotes: 4

Related Questions