Reputation: 517
The sum function returns 0 if it is applied to an empty set. Is there a simple way to make it return NA if it is applied to a set of NA values?
Here is a borrowed example:
test <- data.frame(name = rep(c("A", "B", "C"), each = 4),
var1 = rep(c(1:3, NA), 3),
var2 = 1:12,
var3 = c(rep(NA, 4), 1:8))
test
name var1 var2 var3
1 A 1 1 NA
2 A 2 2 NA
3 A 3 3 NA
4 A NA 4 NA
5 B 1 5 1
6 B 2 6 2
7 B 3 7 3
8 B NA 8 4
9 C 1 9 5
10 C 2 10 6
11 C 3 11 7
12 C NA 12 8
I would like to have per name the sum of the three variables. Here is what I tried:
var_to_aggr <- c("var1","var2","var3")
aggr_by <- "name"
summed <- aggregate(test[var_to_aggr],by=test[aggr_by],FUN="sum", na.rm = TRUE)
This gives me:
name var1 var2 var3
1 A 6 10 0
2 B 6 26 10
3 C 6 42 26
But I need:
name var1 var2 var3
1 A 6 10 NA
2 B 6 26 10
3 C 6 42 26
The sum for name A, var3 should be NA and not 0. (just to be clear, it should not be NA for name A, var1, where the set contains one NA but also valid values that should be summed up). Any ideas?
I have been fiddling with na.action but sum doesn't seem to accept these.
Upvotes: 5
Views: 1657
Reputation: 887088
You can try
f1 <- function(x) if(all(is.na(x))) NA_integer_ else sum(x, na.rm=TRUE)
aggregate(.~name, test, FUN=f1, na.action=NULL)
Or
library(dplyr)
test %>%
group_by(name) %>%
summarise_each(funs(f1))
Or
library(data.table)
setDT(test)[, lapply(.SD, f1), name]
Upvotes: 8