R - rlang - Working with delayed evaluation

Question

Trying to solve the following use case:
I have a full data set (mydf) which I want to dplyr::group_by with different variable sets according to the entries of another set of combinations of variables (mysplits). The issue is, my mysplits data.frame contains names of variables as characters.

There is a dplyr::group_by_ option, but I'm hoping to achieve this with tools similar to the rlang functionality or something similar.

mydf <- 
    data.frame(
        var1 = c('x', 'x', 'y', 'y'), 
        var2 = c('y', 'z', 'x', 'z'),
        var3 = c('a', 'b', 'a', 'b'),
        outcome = runif(4),
        stringsAsFactors = F
    )

mysplits <-
     data.frame(
        g1 = c('var1', 'var2'),
        g2 = c('var2', 'var3'),
        stringsAsFactors = F
     )

I'm looking for something similar to:

dlply(
    .data = mysplits, .variables = (g1, g2),
    function(thissplit){
        group_by(mydf, f(thissplit$g1), f(thissplit$g2)) %>% summarise(mean(outcome))
    }
)

where f() is the missing component of my puzzle.

MrFlick · Accepted Answer

First, make sure your data.frame of names has character values rather than factor levels

mysplits <-
  data.frame(
    g1 = c('var1', 'var2'),
    g2 = c('var2', 'var3'), 
    stringsAsFactors=FALSE
  )

Then you can use group_by_at with strings to choose column names. For example

group_by_at(mydf, c("var1", "var2")) %>% summarise(mean(outcome))

You can loop over values mapy different ways, but using other tidyverse functions rather tha plyr functions you can do

map2(mysplits$g1, mysplits$g2, ~group_by_at(mydf, c(.x, .y)) %>% summarise(mean(outcome)))

If you insist on using group_by and the rlang stuff, you can convert characters to symbols with rlang::sym() and then unquote those with !! so something like

group_by(mydf, !!rlang::sym(thissplit$g1), !!rlang::sym(thissplit$g2)) %>% summarise(mean(outcome))

R - rlang - Working with delayed evaluation

Answers (1)

Related Questions