Tomas Greif
Tomas Greif

Reputation: 22623

Aggregate() with more functions -

Can I use aggregate() with more functions in such way that aggregations are stored as separate columns and not as part of a matrix? I want to have data frame with columns Group.1, cyl.1, cyl.2, not Group.1, cyl.

# Only one function
> aggdata <-aggregate(mtcars["cyl"], by=list(vs), FUN=mean, na.rm=TRUE)
> aggdata
  Group.1      cyl
1       0 7.444444
2       1 4.571429
> str(aggdata)
'data.frame':   2 obs. of  2 variables:
 $ Group.1: num  0 1
 $ cyl    : num  7.44 4.57
> 
# Two functions
> aggdata <-aggregate(mtcars["cyl"], by=list(cyl), FUN=function(x) c(length(x),mean(x)))
> aggdata
  Group.1 cyl.1 cyl.2
1       4    11     4
2       6     7     6
3       8    14     8
> str(aggdata)
'data.frame':   3 obs. of  2 variables:
 $ Group.1: num  4 6 8
 $ cyl    : num [1:3, 1:2] 11 7 14 4 6 8
> aggdata$cyl
     [,1] [,2]
[1,]   11    4
[2,]    7    6
[3,]   14    8

Upvotes: 5

Views: 1458

Answers (2)

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193507

Wrap it in do.call(data.frame, ...):

aggdata <-aggregate(mtcars["cyl"], by=list(mtcars$cyl), 
                    FUN=function(x) c(length(x),mean(x)))
do.call(data.frame, aggdata)
#   Group.1 cyl.1 cyl.2
# 1       4    11     4
# 2       6     7     6
# 3       8    14     8
str(do.call(data.frame, aggdata))
# 'data.frame': 3 obs. of  3 variables:
#  $ Group.1: num  4 6 8
#  $ cyl.1  : num  11 7 14
#  $ cyl.2  : num  4 6 8

After searching a little bit, I just found the source of my answer. There are a few other questions similar to this, but this was where I learned the do.call(data.frame, ...) approach.

(Came to mind what to search for after @James added the same answer as I did and deleted his....)

Upvotes: 9

eddi
eddi

Reputation: 49448

Here's a different idea - switch to data.table instead:

library(data.table)
dt = data.table(mtcars)

dt[, list(.N, mean(cyl)), by = cyl]
#   cyl  N V2
#1:   6  7  6
#2:   4 11  4
#3:   8 14  8
# note, data.table is smart enough not to copy cyl needlessly
# when you're grouping by it, so if you attempt to get length(cyl), you'll get 1
# since cyl is just a number in each 'by' group

str(dt[, list(.N, mean(cyl)), by = cyl])
#Classes ‘data.table’ and 'data.frame':  3 obs. of  3 variables:
# $ cyl: num  6 4 8
# $ N  : int  7 11 14
# $ V2 : num  6 4 8
# - attr(*, ".internal.selfref")=<externalptr> 

Upvotes: 6

Related Questions