multi-level aggregations (like "grouping sets") via ddply or other R function?

Question

I would like to be able to aggregate by multiple columns and get in the results not only the finest aggregations (one value from each grouping column) but also the higher-level aggregations (one value from one grouping column, no restriction on other grouping columns, etc.). I believe in Oracle and Hive this is possible via "grouping sets" (also "cube" and "rollup" in Hive).

This code achieves what I'm looking for:

rbind.fill(ddply(mtcars, .(cyl), summarize, agg=mean(mpg)),
           ddply(mtcars, .(cyl, am), summarize, agg=mean(mpg)))[, c(1,3,2)]

  cyl am      agg
1   4 NA 26.66364
2   6 NA 19.74286
3   8 NA 15.10000
4   4  0 22.90000
5   4  1 28.07500
6   6  0 19.12500
7   6  1 20.56667
8   8  0 15.05000
9   8  1 15.40000

But this is ugly in several ways. For my actual application the definition of the aggregations to perform is long and I would really rather not repeat it. Is there an elegant way to do this?

BrodieG · Accepted Answer

I think this will do what you want:

library(plyr)
grp.cols <- c("vs", "am", "gear", "carb", "cyl")
do.call(
  rbind.fill,
  lapply(1:length(grp.cols), function(x) ddply(mtcars, grp.cols[1:x], summarize, agg=mean(mpg)))
)

multi-level aggregations (like "grouping sets") via ddply or other R function?

Answers (2)

Related Questions

multi-level aggregations (like &quot;grouping sets&quot;) via ddply or other R function?

Answers (2)

Related Questions

multi-level aggregations (like "grouping sets") via ddply or other R function?