pgitti
pgitti

Reputation: 87

aggregate: pass the same function argument twice to FUN

Suppose I have some random data:

year = mpg$year

year[mpg$year>2006] = NA

Now I want to aggregate and use the sum function twice, once with na.rm = T and once with na.rm = F. Is there a way to pass the same argument twice, once for the first sum call and once for the second (without using FUN = function(x)). Something like that:

aggregate(year, by = list(mpg$model), plyr::each(sum, sum), na.rm = T, na.rm = F)

Upvotes: 0

Views: 234

Answers (2)

G. Grothendieck
G. Grothendieck

Reputation: 269451

Using mtcars which is built into R we insert some NA's.

Now using the formula method of aggregate set na.action=na.pass in aggregate to prevent it from automatically removing NA's. Then use the indicated Sum function.

Note that the output of aggregate, a2, will have two columns where the second column is itself a two column matrix. If we want three columns use a3 <- do.call("data.frame", a2) as shown below.

mtcars$mpg[1:3] <-  NA # insert some NA's
Sum <- function(x) c(sum1 = sum(x, na.rm = FALSE), sum2 = sum(x, na.rm = TRUE))
a2 <- aggregate(mpg ~ cyl, mtcars, FUN = Sum, na.action = na.pass); a2
##   cyl mpg.sum1 mpg.sum2
## 1   4       NA    270.5
## 2   6       NA     96.2
## 3   8    211.4    211.4

str(a2)
## 'data.frame':   3 obs. of  2 variables:
##  $ cyl: num  4 6 8
##  $ mpg: num [1:3, 1:2] NA NA 211.4 270.5 96.2 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : NULL
##   .. ..$ : chr [1:2] "sum1" "sum2"

a3 <- do.call("data.frame", a2); a3
   cyl mpg.sum1 mpg.sum2
##   1   4       NA    270.5
##   2   6       NA     96.2
##   3   8    211.4    211.4

str(a3)
## 'data.frame':   3 obs. of  3 variables:
##  $ cyl     : num  4 6 8
##  $ mpg.sum1: num  NA NA 211
##  $ mpg.sum2: num  270.5 96.2 211.4

Using the data.frame method of aggregate is similar except that na.action is no longer an argument and NA's are not removed by default.

aggregate(mtcars["mpg"], mtcars["cyl"], Sum)
##   cyl mpg.sum1 mpg.sum2
## 1   4       NA    270.5
## 2   6       NA     96.2
## 3   8    211.4    211.4

Alternatives

collap from the collapse package is similar to aggregate but does allow a list of functions. It also supplies fsum which defaults to removing NA's. summaryBy in the doBy package also supports a list of functions. dplyr's summarize uses separate arguments instead of a list and data.table can perform aggregation using its own notation.

library(collapse)
collap(mtcars, mpg ~ cyl, c(sum, fsum), keep.col.order = FALSE)
##   cyl sum.mpg fsum.mpg
## 1   4      NA    270.5
## 2   6      NA     96.2
## 3   8   211.4    211.4

library(doBy)
summaryBy(mpg ~ cyl, mtcars, FUN = c(sum, function(x) sum(x, na.rm = TRUE)), 
  fun.names = c("sum1", "sum2"))
##   cyl mpg.sum1 mpg.sum2
## 1   4       NA    270.5
## 2   6       NA     96.2
## 3   8    211.4    211.4

library(dplyr)
mtcars %>%
  group_by(cyl) %>%
  summarize(sum1 = sum(mpg), sum2 = sum(mpg, na.rm = TRUE), .groups = "drop")
## # A tibble: 3 x 3
##     cyl  sum1  sum2
##   <dbl> <dbl> <dbl>
## 1     4   NA  270. 
## 2     6   NA   96.2
## 3     8  211. 211. 

library(data.table)
as.data.table(mtcars)[, .(sum1 = sum(mpg), sum2 = sum(mpg, na.rm = TRUE)), by = cyl]
##    cyl  sum1  sum2
## 1:   6    NA  96.2
## 2:   4    NA 270.5
## 3:   8 211.4 211.4

Upvotes: 2

GKi
GKi

Reputation: 39647

Another option will be to use aggregate in Map or lapply.

x <- mtcars
x$mpg[1:3] <-  NA
Map(aggregate, list(x$mpg), list(list(x$cyl)), "sum", na.rm=c(TRUE, FALSE))
#[[1]]
#  Group.1     x
#1       4 270.5
#2       6  96.2
#3       8 211.4
#
#[[2]]
#  Group.1     x
#1       4    NA
#2       6    NA
#3       8 211.4
lapply(list(T=TRUE, F=FALSE), function(y) aggregate(x$mpg, x["cyl"], sum, na.rm=y))
#$T
#  cyl     x
#1   4 270.5
#2   6  96.2
#3   8 211.4
#
#$F
#  cyl     x
#1   4    NA
#2   6    NA
#3   8 211.4

Or you create a new sum function with different name than na.rm.

Sum <- function(x, Na.rm, ...) sum(x, na.rm = Na.rm)
aggregate(x$mpg, x["cyl"], plyr::each(sum, Sum), na.rm = TRUE, Na.rm = FALSE)
#  cyl x.sum x.Sum
#1   4 270.5    NA
#2   6  96.2    NA
#3   8 211.4 211.4

But personally I would prefer to create a function (but this was not wanted in the question).

aggregate(x$mpg, x["cyl"], function(x) c(T = sum(x, na.rm = TRUE), F = sum(x)))
#  cyl   x.T   x.F
#1   4 270.5    NA
#2   6  96.2    NA
#3   8 211.4 211.4

Upvotes: 0

Related Questions