Alon Hazan
Alon Hazan

Reputation: 300

alternative to ifelse in data.table

I have very big two data.table: A and B. The code below works fine, but it is very slow.

temp2=ifelse(is.na(A) & is.na(B),FALSE,
               ifelse(!is.na(A) & is.na(V),TRUE,
                      ifelse(is.na(A) & !is.na(B),FALSE,
                             ifelse(A!=B,TRUE,FALSE))))

is there any better alternative so the code will run faster?

Upvotes: 2

Views: 635

Answers (1)

mgriebe
mgriebe

Reputation: 908

Since all you need is "true" or "false" returned, it does not seem like you need to use ifelse at all.

If I am reading this correctly (and assuming you meant B not V), then whenever A is NA, you want false returned, regardless of the value of B. Thus, in order for true to be returned, A must not be NA. Next, in order for true to be returned, A cannot equal B. But, if B is NA, NA will be returned from testing A != B. And, if B is NA, but A is not, you want TRUE, so:

temp2 = (!is.na(A))&((A!=B)|is.na(B))

Should do the trick. If you did mean V, then you have three data.tables?

Concerning timing,

require(data.table)

A<- data.table(v1=sample(c(1,2,NA),1e6,replace=TRUE),v2=sample(c(1,2,NA),1e6,replace=TRUE))
B<- data.table(v1=sample(c(1,2,NA),1e6,replace=TRUE),v2=sample(c(1,2,NA),1e6,replace=TRUE))

system.time({temp1 = (!is.na(A))&((A!=B)|(is.na(B)))})
##   user  system elapsed 
##   0.41    0.00    0.41

system.time({temp2 =ifelse(is.na(A) & is.na(B),FALSE,
                                       ifelse(!is.na(A) & is.na(B),TRUE,
                                              ifelse(is.na(A) & !is.na(B),FALSE,
                                                     ifelse(A!=B,TRUE,FALSE))))})
##   user  system elapsed 
##   2.56    0.11    2.68 

all.equal(temp1,temp2)
## true

So, its about 6 times faster.

Upvotes: 1

Related Questions