Aaron Schumacher
Aaron Schumacher

Reputation: 3835

multi-level aggregations (like "grouping sets") via ddply or other R function?

I would like to be able to aggregate by multiple columns and get in the results not only the finest aggregations (one value from each grouping column) but also the higher-level aggregations (one value from one grouping column, no restriction on other grouping columns, etc.). I believe in Oracle and Hive this is possible via "grouping sets" (also "cube" and "rollup" in Hive).

This code achieves what I'm looking for:

rbind.fill(ddply(mtcars, .(cyl), summarize, agg=mean(mpg)),
           ddply(mtcars, .(cyl, am), summarize, agg=mean(mpg)))[, c(1,3,2)]

  cyl am      agg
1   4 NA 26.66364
2   6 NA 19.74286
3   8 NA 15.10000
4   4  0 22.90000
5   4  1 28.07500
6   6  0 19.12500
7   6  1 20.56667
8   8  0 15.05000
9   8  1 15.40000

But this is ugly in several ways. For my actual application the definition of the aggregations to perform is long and I would really rather not repeat it. Is there an elegant way to do this?

Upvotes: 4

Views: 1093

Answers (2)

dnlbrky
dnlbrky

Reputation: 9825

As of v1.11.0, data.table has functions for rollup, cube and groupingsets. jangorecki provides an answer to this question in issue #1377:

library(data.table)
rollup(as.data.table(mtcars), j=.(agg=mean(mpg)), by=c("cyl", "am"))

Upvotes: 4

BrodieG
BrodieG

Reputation: 52677

I think this will do what you want:

library(plyr)
grp.cols <- c("vs", "am", "gear", "carb", "cyl")
do.call(
  rbind.fill,
  lapply(1:length(grp.cols), function(x) ddply(mtcars, grp.cols[1:x], summarize, agg=mean(mpg)))
)

Upvotes: 3

Related Questions