Reputation: 442
Trying to solve the following use case:
I have a full data set (mydf
) which I want to dplyr::group_by
with different variable sets according to the entries of another set of combinations of variables (mysplits
). The issue is, my mysplits
data.frame contains names of variables as characters.
There is a dplyr::group_by_
option, but I'm hoping to achieve this with tools similar to the rlang
functionality or something similar.
mydf <-
data.frame(
var1 = c('x', 'x', 'y', 'y'),
var2 = c('y', 'z', 'x', 'z'),
var3 = c('a', 'b', 'a', 'b'),
outcome = runif(4),
stringsAsFactors = F
)
mysplits <-
data.frame(
g1 = c('var1', 'var2'),
g2 = c('var2', 'var3'),
stringsAsFactors = F
)
I'm looking for something similar to:
dlply(
.data = mysplits, .variables = (g1, g2),
function(thissplit){
group_by(mydf, f(thissplit$g1), f(thissplit$g2)) %>% summarise(mean(outcome))
}
)
where f()
is the missing component of my puzzle.
Upvotes: 1
Views: 115
Reputation: 206382
First, make sure your data.frame of names has character values rather than factor levels
mysplits <-
data.frame(
g1 = c('var1', 'var2'),
g2 = c('var2', 'var3'),
stringsAsFactors=FALSE
)
Then you can use group_by_at
with strings to choose column names. For example
group_by_at(mydf, c("var1", "var2")) %>% summarise(mean(outcome))
You can loop over values mapy different ways, but using other tidyverse functions rather tha plyr functions you can do
map2(mysplits$g1, mysplits$g2, ~group_by_at(mydf, c(.x, .y)) %>% summarise(mean(outcome)))
If you insist on using group_by
and the rlang stuff, you can convert characters to symbols with rlang::sym()
and then unquote those with !!
so something like
group_by(mydf, !!rlang::sym(thissplit$g1), !!rlang::sym(thissplit$g2)) %>% summarise(mean(outcome))
Upvotes: 1