Bahgat Nassour
Bahgat Nassour

Reputation: 167

Adding a new column to a matrix conditional on the values in the other columns

I have this dataset

data
     [C1]  [C2] [C3] [C4] [C5] [C6] [C7] [C8]
[1,]    5    1    2    1    4    2    1   NA
[2,]    4    1    3    4    1    1   NA    2
[3,]    3    4    6    7    1    1    2    2
[4,]    1    3   NA    1   NA    2   NA   NA
[5,]    1    NA   5   NA   NA    4    1    2
[6,]    1    4   NA   NA   NA    4    1    2
[7,]    1    4   NA   NA   NA    4    1    2

I want to add new column C9 which could take two values 1 (True) if the corresponding row has the value 1 in columns C2 ,C3 or C4 or 0 (False) otherwise. I have tried this code

C9<-data[,2:4]==1
#change the logical matrix into numeric 
C9<-C9*1
#convert the matrix into vector #
C9<-rowSums(C9)
data=cbind(data,C9)

The code works well but consumes more time so my question is there a unique way to do that , since I am beginner in R ?.

Upvotes: 0

Views: 1100

Answers (1)

jlhoward
jlhoward

Reputation: 59355

If I understand the question correctly, C9 must be 1 if one of C2, C3, or C4 is exactly 1, 0 otherwise. So the solution has to deal with NAs.

This compares three approaches:

f.1 <- function() (rowSums(data[,2:4]==1, na.rm=TRUE)>0)*1L
f.2 <- function() {x<-rep(0L,nrow(data)); x[(data[,2]==1 | data[,3]==1 | data[,4]==1)]<-1L; x}
f.3 <- function() apply(data[,2:4], 1, function(x) any(x==1, na.rm=T))*1L
library(microbenchmark)
microbenchmark(f.1(),f.2(),f.3(), times=1000)
# Unit: microseconds
#   expr    min     lq      mean  median       uq       max neval cld
#  f.1() 11.845 15.991  20.76593  18.952  22.5050   293.751  1000  a 
#  f.2() 10.660 14.806  44.43363  17.768  20.7290 25063.000  1000  a 
#  f.3() 81.137 91.797 121.80148 103.050 125.8515  2719.566  1000   b

identical(f.1(),f.2())
# [1] TRUE
identical(f.1(),f.3())
# [1] TRUE

f.1() is your approach (more or less), f.2() is a very simple and direct approach, and f.3() is from the comment. As you can see, the simple/direct approach is fastest in this case, but just by a few percent.

Why do you think this is too slow?

Upvotes: 1

Related Questions