Reputation: 1074
Problem:
I am using a loop in R to create a new vector from two ("parent") vectors, generating a random value for each position in the new vector that is in the range of the values that the parents have in this position (it's for the crossover phase in a genetic algorithm). Note that I don't want the mean values of x & y, but namely random values that are in range of the values on the respective positions.
Example code:
x = c(0.1, 0.7, 1, 0.8)
y = c(0, 0.9, 0.2, 1)
child = rep(NA, length(x))
for(i in 1:length(x)){
child[i] = sample(seq(min(x[i], y[i]),
max(x[i],y[i]), by=0.01), 1)
}
# This might yield, for example: 0.02 0.83 0.73 0.88
Question:
It works fine, but I'm thinking maybe there's a more efficient way to do this (since I need to do this for 100-1000 individuals on each of the thousands of iterations).
In R, there are nice fast functions like ifelse
, colMeans
, max.col
, match
, rollmean
, etc., that work on vectors, so I'm wondering, is there's something like that for my purposes as well? (the apply
gang probably wouldn't help much here though, from what I understand). Or is a loop like this really the best I can do?
Upvotes: 2
Views: 204
Reputation: 1890
Here's a data.table solution with 10 million records in 2 seconds:
library(data.table)
set.seed(4444)
n <- 10000000
system.time({
dt <- data.table(x=runif(n=n,min=0,max=10),y=runif(n=n,min=0,max=10))
dt[,child := runif(n=n,min=pmin(x,y),max=pmax(x,y)),by=.I]
})
dt
#user system elapsed
#2.01 0.03 2.06
Upvotes: 3
Reputation: 93761
Here's an mapply
solution:
mapply(runif, 1, pmin(x,y), pmax(x,y))
(Although @jeremycg's solution shows that you don't need *apply functions and can just vectorize the min and max for runif
as well.)
Upvotes: 4
Reputation: 24945
We can use runif
to get random numbers from a uniform distribution, and pmax
and pmin
to vectorize the min and max:
round(runif(length(x), pmin(x, y), pmax(x, y)), 2)
A small benchmark:
library(microbenchmark)
set.seed(42)
x <- runif(1000)
y <- runif(1000)
microbenchmark(vectorize ={round(runif(length(x), pmin(x, y), pmax(x, y)), 2)},
mapply = {mapply(runif, 1, pmin(x, y), pmax(x, y))},
lapply = {unlist(lapply(seq_along(x), function(p, q, i) { sample(seq(min(p[i], q[i]), max(p[i],q[i]), by=0.01), 1) }, p=x, q=y))})
Unit: microseconds
expr min lq mean median uq max neval cld
vectorize 316.417 321.026 341.6501 336.0015 342.914 529.154 100 a
mapply 4311.559 4429.640 4733.0420 4543.6875 4806.535 9935.631 100 b
lapply 46987.459 47718.980 50484.6058 48474.5015 53599.756 60043.093 100 c
Upvotes: 6