Reputation: 3835
I would like to be able to aggregate by multiple columns and get in the results not only the finest aggregations (one value from each grouping column) but also the higher-level aggregations (one value from one grouping column, no restriction on other grouping columns, etc.). I believe in Oracle and Hive this is possible via "grouping sets" (also "cube" and "rollup" in Hive).
This code achieves what I'm looking for:
rbind.fill(ddply(mtcars, .(cyl), summarize, agg=mean(mpg)),
ddply(mtcars, .(cyl, am), summarize, agg=mean(mpg)))[, c(1,3,2)]
cyl am agg
1 4 NA 26.66364
2 6 NA 19.74286
3 8 NA 15.10000
4 4 0 22.90000
5 4 1 28.07500
6 6 0 19.12500
7 6 1 20.56667
8 8 0 15.05000
9 8 1 15.40000
But this is ugly in several ways. For my actual application the definition of the aggregations to perform is long and I would really rather not repeat it. Is there an elegant way to do this?
Upvotes: 4
Views: 1093
Reputation: 9825
As of v1.11.0, data.table
has functions for rollup
, cube
and groupingsets
. jangorecki provides an answer to this question in issue #1377:
library(data.table)
rollup(as.data.table(mtcars), j=.(agg=mean(mpg)), by=c("cyl", "am"))
Upvotes: 4
Reputation: 52677
I think this will do what you want:
library(plyr)
grp.cols <- c("vs", "am", "gear", "carb", "cyl")
do.call(
rbind.fill,
lapply(1:length(grp.cols), function(x) ddply(mtcars, grp.cols[1:x], summarize, agg=mean(mpg)))
)
Upvotes: 3