user3554004
user3554004

Reputation: 1074

Optimize/vectorize a loop in R that generates randoms from ranges in input vectors?

Problem:

I am using a loop in R to create a new vector from two ("parent") vectors, generating a random value for each position in the new vector that is in the range of the values that the parents have in this position (it's for the crossover phase in a genetic algorithm). Note that I don't want the mean values of x & y, but namely random values that are in range of the values on the respective positions.

Example code:

x = c(0.1, 0.7, 1, 0.8)
y = c(0, 0.9, 0.2, 1)
child = rep(NA, length(x))
for(i in 1:length(x)){
  child[i] = sample(seq(min(x[i], y[i]),  
                        max(x[i],y[i]), by=0.01), 1)
}
# This might yield, for example: 0.02 0.83 0.73 0.88 

Question:

It works fine, but I'm thinking maybe there's a more efficient way to do this (since I need to do this for 100-1000 individuals on each of the thousands of iterations). In R, there are nice fast functions like ifelse, colMeans, max.col, match, rollmean, etc., that work on vectors, so I'm wondering, is there's something like that for my purposes as well? (the apply gang probably wouldn't help much here though, from what I understand). Or is a loop like this really the best I can do?

Upvotes: 2

Views: 204

Answers (3)

ddunn801
ddunn801

Reputation: 1890

Here's a data.table solution with 10 million records in 2 seconds:

library(data.table)
set.seed(4444)
n <- 10000000
system.time({
  dt <- data.table(x=runif(n=n,min=0,max=10),y=runif(n=n,min=0,max=10))
  dt[,child := runif(n=n,min=pmin(x,y),max=pmax(x,y)),by=.I]
})
dt

#user  system elapsed 
#2.01    0.03    2.06 

Upvotes: 3

eipi10
eipi10

Reputation: 93761

Here's an mapply solution:

mapply(runif, 1, pmin(x,y), pmax(x,y))

(Although @jeremycg's solution shows that you don't need *apply functions and can just vectorize the min and max for runif as well.)

Upvotes: 4

jeremycg
jeremycg

Reputation: 24945

We can use runif to get random numbers from a uniform distribution, and pmax and pmin to vectorize the min and max:

round(runif(length(x), pmin(x, y), pmax(x, y)), 2)

A small benchmark:

library(microbenchmark)

set.seed(42)
x <- runif(1000)
y <- runif(1000)

microbenchmark(vectorize ={round(runif(length(x), pmin(x, y), pmax(x, y)), 2)},
               mapply =  {mapply(runif, 1, pmin(x, y), pmax(x, y))},
               lapply = {unlist(lapply(seq_along(x), function(p, q, i) { sample(seq(min(p[i], q[i]), max(p[i],q[i]), by=0.01), 1) }, p=x, q=y))})

Unit: microseconds
      expr       min        lq       mean     median        uq       max neval cld
 vectorize   316.417   321.026   341.6501   336.0015   342.914   529.154   100 a  
    mapply  4311.559  4429.640  4733.0420  4543.6875  4806.535  9935.631   100  b 
    lapply 46987.459 47718.980 50484.6058 48474.5015 53599.756 60043.093   100   c

Upvotes: 6

Related Questions