data.table aggregation having trouble with NAs

Question

I'm using the data.table package to aggregate a panel with multiple observations in each time-period for a given location into a panel in which there is a unique observation for a time-period-location. However, I'm having trouble aggregating for observations with an NA. While this works well:

set.seed(123)
data.frame(name = c("a", "a", "a", "b", "b", "b"), t = rep(c(1,2),3), x1 = sample(0:10,6), x2 =  sample(0:10,6))
f1

  name t x1 x2
    a 1  3  5
    a 2  7  8
    a 1 10  4
    b 2  9  3
    b 1  6  6
    b 2  0  2

f2 <- setDT(f1)[, lapply(.SD, sum(x, na.rm = TRUE)), by = .(name,t), .SDcols = c("x1", "x2")]
f2

   name t x1 x2
    a 1 13  9
    a 2  7  8
    b 2  9  5
    b 1  6  6

Adding an NA prevents sum() from working:

f1[1,3] <- NA
f1
   name t x1 x2
    a 1 NA  5
    a 2  7  8
    a 1 10  4
    b 2  9  3
    b 1  6  6
    b 2  0  2

f2 <- setDT(f1)[, lapply(.SD, sum(x, na.rm = TRUE)), by = .(name,t), .SDcols = c("x1", "x2")]
f2
name t x1 x2
    a 1 NA  9
    a 2  7  8
    b 2  9  5
    b 1  6  6

Whereas the x1 value for a, 1 should be 10. Any idea why this is happening even with the na.rm argument inside? Thanks.

Jaap · Accepted Answer

This is an issue of placing the brackets in the right spot. You should use lapply(.SD, sum, na.rm = TRUE) instead of lapply(.SD, sum(x, na.rm = TRUE)). When using lapply, you have to add the extra parameters of the sum-function after the call for sum and not inside the call for sum. Furtherore, you don't need the (x)-part in sum(x). Because the structure of lapply is lapply(X, FUN, ...), the function specified in the FUN-part is automatically applied to the variable in the X-part.

So, your code should be:

setDT(f1)[, lapply(.SD, sum, na.rm = TRUE), by = .(name,t)]

which gives:

   name t x1 x2
1:    a 1  0 11
2:    a 2  5  9
3:    b 2 18 12
4:    b 1  1  4

Note: I left the .SDcols = c("x1", "x2") part out of the code as it is not needed in this case. Including it will give you the same result.

data.table aggregation having trouble with NAs

Answers (1)

Related Questions