Reputation: 1352
Currently, I'm working on a big data set. The only thing that I do during this task is preprocessing the data.
When I'm running my code, I see that my computers memory increased very fast with this line:
binary <- ifelse(subset_variables1 == "0", 0, 1)
The only thing that that line should do, is making all my values binary. Can this be done on a quicker manner? Or is this already a good manner (where I have to deal with the memory issues).
Upvotes: 4
Views: 494
Reputation: 3439
Here is a slower but a bit more general solution
v <- rep(1,length(subset_variables1))
v[subset_variables1 =="0"] <- 0
and a ifelse
for numeric vectors,
ifelse_sign <- function(test,yes,no){
if(length(yes)==1)yes = rep(yes,length(test))
if(length(no) ==1)no = rep(no ,length(test))
yes[!test] <- 0
no [test] <- 0
yes + no + test *0
}
ifelse_sign(subset_variables1=="0",0,1)
Upvotes: 0
Reputation: 6813
When working with boolean types and / or conditions, you can use them with mathematical operators and they will be interpreted as 1
or 0
(for TRUE
and FALSE
). So +("0" == 0)
returns 1
, and 1 - ("0" == 0)
returns 0
.
If you have a vector like this
set.seed(666)
subset_variables1 <- sample(c("0", "1"), 10000, replace = TRUE)
You can use 1 - (subset_variables1 == "0")
to get the required result.
I have compared it to a couple of suggestions in the comments and it is the fastest.
library(microbenchmark)
microbenchmark(ifelse = ifelse(subset_variables1 == "0", 0, 1),
as.numeric = as.numeric(subset_variables1),
if_else = dplyr::if_else(subset_variables1 == "0", 0, 1),
plus = 1 - (subset_variables1 == "0"),
times = 1000
)
Unit: microseconds
expr min lq mean median uq max neval
ifelse 686.668 701.3440 977.0863 910.6570 1170.816 3222.192 1000
as.numeric 631.813 642.5910 715.8687 677.3830 720.841 1819.925 1000
if_else 347.409 377.0665 537.3344 482.7055 657.468 1603.241 1000
plus 97.170 98.8845 129.9091 107.8545 146.303 741.557 1000
Upvotes: 10