Reputation: 72779
Using dplyr, I'd like to summarize [sic] by a variable that I can vary (e.g. in a loop or apply-style command).
Typing the names in directly works fine:
library(dplyr)
ChickWeight %>% group_by( Chick, Diet ) %>% summarise( mw = mean( weight ) )
But group_by
wasn't written to take a character vector, so passing in results is harder.
v <- "Diet"
ChickWeight %>% group_by( c( "Chick", v ) ) %>% summarise( mw = mean( weight ) )
## Error
I'll post one solution, but curious to see how others have solved this.
Upvotes: 7
Views: 650
Reputation: 21443
The underscore functions of dplyr could be useful for that:
ChickWeight %>% group_by_( "Chick", v ) %>% summarise( mw = mean( weight ) )
From the new features in dplyr 0.3:
You can now program with dplyr – every function that uses non-standard evaluation (NSE) also has a standard evaluation (SE) twin that ends in _
. For example, the SE version of filter() is called filter_
(). The SE version of each function has similar arguments, but they must be explicitly “quoted”.
Upvotes: 11
Reputation: 72779
Here's one solution and how I arrived at it.
What does group_by expect?
> group_by
function (x, ..., add = FALSE)
{
new_groups <- named_dots(...)
Down the rabbit hole:
> dplyr:::named_dots
function (...)
{
auto_name(dots(...))
}
<environment: namespace:dplyr>
> dplyr:::auto_name
function (x)
{
names(x) <- auto_names(x)
x
}
<environment: namespace:dplyr>
> dplyr:::auto_names
function (x)
{
nms <- names2(x)
missing <- nms == ""
if (all(!missing))
return(nms)
deparse2 <- function(x) paste(deparse(x, 500L), collapse = "")
defaults <- vapply(x[missing], deparse2, character(1), USE.NAMES = FALSE)
nms[missing] <- defaults
nms
}
<environment: namespace:dplyr>
> dplyr:::names2
function (x)
{
names(x) %||% rep("", length(x))
}
Using that information, how to go about crafting a solution?
# Naive solution fails:
ChickWeight %>% do.call( group_by, list( Chick, Diet ) ) %>% summarise( mw = mean( weight ) )
# Slightly cleverer:
do.call( group_by, list( x = ChickWeight, Chick, Diet, add = FALSE ) ) %>% summarise( mw = mean( weight ) )
## But still fails with,
## Error in do.call(group_by, list(x = ChickWeight, Chick, Diet, add = FALSE)) : object 'Chick' not found
The solution lies in quoting the arguments so their evaluation is delayed until they're in the environment that includes the x
tbl:
do.call( group_by, list( x = ChickWeight, quote(Chick), quote(Diet), add = FALSE ) ) %>% summarise( mw = mean( weight ) )
## Bingo!
v <- "Diet"
do.call( group_by, list( x = ChickWeight, quote(Chick), substitute( a, list( a = v ) ), add = FALSE ) ) %>% summarise( mw = mean( weight ) )
Upvotes: 0