Summing across rows of a data.table for specific columns with NA

Question

library(data.table)
TEST <- data.table(Time=c("0","0","0","7","7","7","12"),
             Zone=c("1","1","0","1","0","0","1"),
             quadrat=c(1,2,3,1,2,3,1),
             Sp1=c(NA,4,29,9,1,2,10),
             Sp2=c(NA,NA,11,15,32,15,10),
             Sp3=c(NA,0,1,1,1,1,0))

TEST[, SumAbundance := rowSums(.SD), .SDcols = 4:6]

If there are three NA then I think that SumAbundance should be NA. If there is 1 or 2 NA values then still compute the sum and ignore the NA.

akrun · Accepted Answer

We can have several options for this i.e. either do the rowSums first and then replace the rows where all are NA or create an index in i to do the sum only for those rows with at least one non-NA.

library(data.table)
TEST[, SumAbundance := replace(rowSums(.SD, na.rm = TRUE),
           Reduce(`&`, lapply(.SD, is.na)), NA), .SDcols = 4:6]

Or slightly more compact option

TEST[, SumAbundance :=  (NA^!rowSums(!is.na(.SD))) * 
             rowSums(.SD, na.rm = TRUE), .SDcols = 4:6]

Or construct a function and reuse

rowSums_new <- function(dat) {
  fifelse(rowSums(is.na(dat)) != ncol(dat), rowSums(dat, na.rm = TRUE),  NA_real_)
    }
TEST[, SumAbundance := rowSums_new(.SD), .SDcols = 4:6]

Summing across rows of a data.table for specific columns with NA

Answers (2)

Related Questions