Reputation: 87
Suppose I have some random data:
year = mpg$year
year[mpg$year>2006] = NA
Now I want to aggregate and use the sum function twice, once with na.rm = T
and once with na.rm = F
. Is there a way to pass the same argument twice, once for the first sum call and once for the second (without using FUN = function(x)
). Something like that:
aggregate(year, by = list(mpg$model), plyr::each(sum, sum), na.rm = T, na.rm = F)
Upvotes: 0
Views: 234
Reputation: 269451
Using mtcars
which is built into R we insert some NA's.
Now using the formula method of aggregate
set na.action=na.pass
in aggregate
to prevent it from automatically removing NA's. Then use the indicated Sum
function.
Note that the output of aggregate
, a2
, will have two columns where the second column is itself a two column matrix. If we want three columns use a3 <- do.call("data.frame", a2)
as shown below.
mtcars$mpg[1:3] <- NA # insert some NA's
Sum <- function(x) c(sum1 = sum(x, na.rm = FALSE), sum2 = sum(x, na.rm = TRUE))
a2 <- aggregate(mpg ~ cyl, mtcars, FUN = Sum, na.action = na.pass); a2
## cyl mpg.sum1 mpg.sum2
## 1 4 NA 270.5
## 2 6 NA 96.2
## 3 8 211.4 211.4
str(a2)
## 'data.frame': 3 obs. of 2 variables:
## $ cyl: num 4 6 8
## $ mpg: num [1:3, 1:2] NA NA 211.4 270.5 96.2 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : NULL
## .. ..$ : chr [1:2] "sum1" "sum2"
a3 <- do.call("data.frame", a2); a3
cyl mpg.sum1 mpg.sum2
## 1 4 NA 270.5
## 2 6 NA 96.2
## 3 8 211.4 211.4
str(a3)
## 'data.frame': 3 obs. of 3 variables:
## $ cyl : num 4 6 8
## $ mpg.sum1: num NA NA 211
## $ mpg.sum2: num 270.5 96.2 211.4
Using the data.frame method of aggregate
is similar except that na.action
is no longer an argument and NA's are not removed by default.
aggregate(mtcars["mpg"], mtcars["cyl"], Sum)
## cyl mpg.sum1 mpg.sum2
## 1 4 NA 270.5
## 2 6 NA 96.2
## 3 8 211.4 211.4
collap
from the collapse package is similar to aggregate but does allow a list of functions. It also supplies fsum
which defaults to removing NA's. summaryBy
in the doBy package also supports a list of functions. dplyr's summarize
uses separate arguments instead of a list and data.table can perform aggregation using its own notation.
library(collapse)
collap(mtcars, mpg ~ cyl, c(sum, fsum), keep.col.order = FALSE)
## cyl sum.mpg fsum.mpg
## 1 4 NA 270.5
## 2 6 NA 96.2
## 3 8 211.4 211.4
library(doBy)
summaryBy(mpg ~ cyl, mtcars, FUN = c(sum, function(x) sum(x, na.rm = TRUE)),
fun.names = c("sum1", "sum2"))
## cyl mpg.sum1 mpg.sum2
## 1 4 NA 270.5
## 2 6 NA 96.2
## 3 8 211.4 211.4
library(dplyr)
mtcars %>%
group_by(cyl) %>%
summarize(sum1 = sum(mpg), sum2 = sum(mpg, na.rm = TRUE), .groups = "drop")
## # A tibble: 3 x 3
## cyl sum1 sum2
## <dbl> <dbl> <dbl>
## 1 4 NA 270.
## 2 6 NA 96.2
## 3 8 211. 211.
library(data.table)
as.data.table(mtcars)[, .(sum1 = sum(mpg), sum2 = sum(mpg, na.rm = TRUE)), by = cyl]
## cyl sum1 sum2
## 1: 6 NA 96.2
## 2: 4 NA 270.5
## 3: 8 211.4 211.4
Upvotes: 2
Reputation: 39647
Another option will be to use aggregate
in Map
or lapply
.
x <- mtcars
x$mpg[1:3] <- NA
Map(aggregate, list(x$mpg), list(list(x$cyl)), "sum", na.rm=c(TRUE, FALSE))
#[[1]]
# Group.1 x
#1 4 270.5
#2 6 96.2
#3 8 211.4
#
#[[2]]
# Group.1 x
#1 4 NA
#2 6 NA
#3 8 211.4
lapply(list(T=TRUE, F=FALSE), function(y) aggregate(x$mpg, x["cyl"], sum, na.rm=y))
#$T
# cyl x
#1 4 270.5
#2 6 96.2
#3 8 211.4
#
#$F
# cyl x
#1 4 NA
#2 6 NA
#3 8 211.4
Or you create a new sum
function with different name than na.rm
.
Sum <- function(x, Na.rm, ...) sum(x, na.rm = Na.rm)
aggregate(x$mpg, x["cyl"], plyr::each(sum, Sum), na.rm = TRUE, Na.rm = FALSE)
# cyl x.sum x.Sum
#1 4 270.5 NA
#2 6 96.2 NA
#3 8 211.4 211.4
But personally I would prefer to create a function (but this was not wanted in the question).
aggregate(x$mpg, x["cyl"], function(x) c(T = sum(x, na.rm = TRUE), F = sum(x)))
# cyl x.T x.F
#1 4 270.5 NA
#2 6 96.2 NA
#3 8 211.4 211.4
Upvotes: 0