user2980491
user2980491

Reputation: 313

Summing lots of Vectors; row-wise or elementwise, but ignoring NA values

I am trying to create a new vector that is the sum of 35 other vectors. The problem is that there are lots of NA values, but for this particular use, I want to treat those as zeros. Adding the vectors won't work, because if any of the 35 vectors contain an NA, the result is NA. Here is the example of the problem:

col1<-c(NA,1,2,3)
col2<-c(1,2,3,NA)
col3<-c(NA,NA,2,3)
Sum<-col1+col2+col3
Sum
# [1] NA NA  7 NA

I want the result to be 1, 3, 7, 6.
I suppose I could create new versions of each of the vectors in which I replace the NA with a 0, but that would be a lot of work when applied to 35 vectors. Is there a simple function that will help me out?

Upvotes: 21

Views: 30831

Answers (3)

terraviva
terraviva

Reputation: 41

For a tidyverse answer I would say that you have to turn the function sum() that is usually a summary function into a vectorized function using rowwise(). This will allow you to transform sum into a multiple input operator to which you can pass the na.rm = TRUE parameter as follows:

t <- tibble(col1, col2, col3)
t %>% rowwise() %>% mutate(sum = sum(col1, col2, col3, na.rm = TRUE)) 

or if you prefer without the pipes

t2 <- rowwise(t)
t2 <- mutate(t2, sum = sum(col1, col2, col3, na.rm = TRUE))

to extract that last column of the table you can do

select(t2, sum)

or if you want it as a vector

pull(t2, sum)

Upvotes: 0

joran
joran

Reputation: 173627

Put them in a matrix first:

apply(cbind(col1,col2,col3),1,sum,na.rm = TRUE)
[1] 1 3 7 6

You can read about each function here using R's built-in documentation: ?apply, ?cbind.

cbind stands for "column bind": it takes several vectors or arrays and binds them "by column" into a single array:

cbind(col1,col2,col3)
     col1 col2 col3
[1,]   NA    1   NA
[2,]    1    2   NA
[3,]    2    3    2
[4,]    3   NA    3

apply, well, applies a function (sum in this case) to either the rows or columns of a matrix. This allows us to use the na.rm = TRUE argument to sum so that the NA values are dropped.

Upvotes: 7

IRTFM
IRTFM

Reputation: 263411

Could also have used the rowSums function:

rowSums( cbind (col1,col2,col3), na.rm=TRUE)
#[1] 1 3 7 6

?rowSums   # also has colSums described on same help page

Upvotes: 37

Related Questions