R overflow
R overflow

Reputation: 1352

writing a ifelse() on a faster way (with less memory)

Currently, I'm working on a big data set. The only thing that I do during this task is preprocessing the data.

When I'm running my code, I see that my computers memory increased very fast with this line:

binary <- ifelse(subset_variables1 == "0", 0, 1)

The only thing that that line should do, is making all my values binary. Can this be done on a quicker manner? Or is this already a good manner (where I have to deal with the memory issues).

Upvotes: 4

Views: 494

Answers (2)

user3226167
user3226167

Reputation: 3439

Here is a slower but a bit more general solution

v <- rep(1,length(subset_variables1))
v[subset_variables1 =="0"] <- 0

and a ifelse for numeric vectors,

ifelse_sign <- function(test,yes,no){

    if(length(yes)==1)yes = rep(yes,length(test))
    if(length(no) ==1)no  = rep(no ,length(test))

    yes[!test] <- 0
    no [test]  <- 0

    yes + no + test *0
}

ifelse_sign(subset_variables1=="0",0,1)

Upvotes: 0

clemens
clemens

Reputation: 6813

When working with boolean types and / or conditions, you can use them with mathematical operators and they will be interpreted as 1 or 0 (for TRUE and FALSE). So +("0" == 0) returns 1, and 1 - ("0" == 0) returns 0.

If you have a vector like this

set.seed(666)
subset_variables1 <- sample(c("0", "1"), 10000, replace = TRUE)

You can use 1 - (subset_variables1 == "0") to get the required result.

I have compared it to a couple of suggestions in the comments and it is the fastest.

library(microbenchmark)

microbenchmark(ifelse = ifelse(subset_variables1 == "0", 0, 1),
               as.numeric = as.numeric(subset_variables1),
               if_else = dplyr::if_else(subset_variables1 == "0", 0, 1),
               plus = 1 - (subset_variables1 == "0"),
               times = 1000
)

Unit: microseconds
       expr     min       lq     mean   median       uq      max neval
     ifelse 686.668 701.3440 977.0863 910.6570 1170.816 3222.192  1000
 as.numeric 631.813 642.5910 715.8687 677.3830  720.841 1819.925  1000
    if_else 347.409 377.0665 537.3344 482.7055  657.468 1603.241  1000
       plus  97.170  98.8845 129.9091 107.8545  146.303  741.557  1000

Upvotes: 10

Related Questions