Is there a way for me to avoid the for loop or make it more efficient?

Question

I want to pick one element from x, one element from y (x and y are mutually exclusive), and one element from x or y that has not already been selected. I then want to repeat the process a specified number of times and store the results of each trial in a dataframe. (note: I am not interested in trying to find every possible combination)

The code below works but slows considerably as the number of trials increases.

x <- 1:4
y <- 5:8
z <- c(x, y) #edited - previous code read a, b in place of x, y
trials <- 5
sel <- data.frame()
set.seed(123)
for (i in 1:trials){
    x_sel <- sample(x, 1)
    y_sel <- sample(y, 1)
    rem <- z[!(z %in% c(x_sel, y_sel))]
    z_sel <- sample(rem, 1)
    sel <- rbind(sel, cbind(x_sel, y_sel, z_sel))
}

joran · Accepted Answer

This should probably be somewhat faster, but I doubt it's the fastest possible. Certainly Rcpp would be the fastest, I would think.

> set.seed(123)
> x <- 1:4
> y <- 5:8
> z <- c(x, y)
> trials <- 5
> 
> xval <- sample(x,size = trials,replace = TRUE)
> yval <- sample(y,size = trials,replace = TRUE)
> zval <- mapply(FUN = function(x,y,z) {sample(setdiff(z,c(x,y)),1)},
                             x = xval,
                             y = yval,
                             MoreArgs = list(z = z))
> 
> result <- data.frame(xval = xval,
                                         yval = yval,
                                         zval = zval)
> result
  xval yval zval
1    2    5    8
2    4    7    3
3    2    8    6
4    4    7    5
5    4    6    1

At only 10k samples, this appears to be ~37x faster than your for loop (which was primarily inefficient because of the appending things one at a time onto sel, not anything inherent in the for loop). The difference between this and a more sensibly written for loop would likely be much less.

Is there a way for me to avoid the for loop or make it more efficient?

Answers (2)

Related Questions