bvowe
bvowe

Reputation: 3384

Summing across rows of a data.table for specific columns with NA

library(data.table)
TEST <- data.table(Time=c("0","0","0","7","7","7","12"),
             Zone=c("1","1","0","1","0","0","1"),
             quadrat=c(1,2,3,1,2,3,1),
             Sp1=c(NA,4,29,9,1,2,10),
             Sp2=c(NA,NA,11,15,32,15,10),
             Sp3=c(NA,0,1,1,1,1,0))

TEST[, SumAbundance := rowSums(.SD), .SDcols = 4:6]

If there are three NA then I think that SumAbundance should be NA. If there is 1 or 2 NA values then still compute the sum and ignore the NA.

Upvotes: 4

Views: 1211

Answers (2)

Cole
Cole

Reputation: 11255

Another option is to subset in i and then update in j:

library(data.table)
dt <- data.table(Time=c("0","0","0","7","7","7","12"),
                   Zone=c("1","1","0","1","0","0","1"),
                   quadrat=c(1,2,3,1,2,3,1),
                   Sp1=c(NA,4,29,9,1,2,10),
                   Sp2=c(NA,NA,11,15,32,15,10),
                   Sp3=c(NA,0,1,1,1,1,0))

dt[!(is.na(Sp1) & is.na(Sp2) &is.na(Sp3)),
     SumAbundance := rowSums(.SD, na.rm = TRUE),
     .SDcols = 4:6]

dt
#>    Time Zone quadrat Sp1 Sp2 Sp3 SumAbundance
#> 1:    0    1       1  NA  NA  NA           NA
#> 2:    0    1       2   4  NA   0            4
#> 3:    0    0       3  29  11   1           41
#> 4:    7    1       1   9  15   1           25
#> 5:    7    0       2   1  32   1           34
#> 6:    7    0       3   2  15   1           18
#> 7:   12    1       1  10  10   0           20

@Akrun was using Reduce to reduce the amount of typing. We could have used that approach in this i statement as well.

Upvotes: 2

akrun
akrun

Reputation: 886948

We can have several options for this i.e. either do the rowSums first and then replace the rows where all are NA or create an index in i to do the sum only for those rows with at least one non-NA.

library(data.table)
TEST[, SumAbundance := replace(rowSums(.SD, na.rm = TRUE),
           Reduce(`&`, lapply(.SD, is.na)), NA), .SDcols = 4:6]

Or slightly more compact option

TEST[, SumAbundance :=  (NA^!rowSums(!is.na(.SD))) * 
             rowSums(.SD, na.rm = TRUE), .SDcols = 4:6]

Or construct a function and reuse

rowSums_new <- function(dat) {
  fifelse(rowSums(is.na(dat)) != ncol(dat), rowSums(dat, na.rm = TRUE),  NA_real_)
    }
TEST[, SumAbundance := rowSums_new(.SD), .SDcols = 4:6]

Upvotes: 5

Related Questions