drumminactuary
drumminactuary

Reputation: 119

Is there a way for me to avoid the for loop or make it more efficient?

I want to pick one element from x, one element from y (x and y are mutually exclusive), and one element from x or y that has not already been selected. I then want to repeat the process a specified number of times and store the results of each trial in a dataframe. (note: I am not interested in trying to find every possible combination)

The code below works but slows considerably as the number of trials increases.

x <- 1:4
y <- 5:8
z <- c(x, y) #edited - previous code read a, b in place of x, y
trials <- 5
sel <- data.frame()
set.seed(123)
for (i in 1:trials){
    x_sel <- sample(x, 1)
    y_sel <- sample(y, 1)
    rem <- z[!(z %in% c(x_sel, y_sel))]
    z_sel <- sample(rem, 1)
    sel <- rbind(sel, cbind(x_sel, y_sel, z_sel))
}

Upvotes: 1

Views: 90

Answers (2)

Robert
Robert

Reputation: 11

My approach is not elegant but seems to be efficient when a number of trials is large. In order to prove it I created 3 function: f1 - yours, f2 - joran's, f3 - mine

library(microbenchmark)

f1 <- function() {
   x <- 1:4
   y <- 5:8
   z <- c(x, y) #edited - previous code read a, b in place of x, y
   trials <- 5000
   sel <- data.frame()
   set.seed(123)
   for (i in 1:trials) {
      x_sel <- sample(x, 1)
      y_sel <- sample(y, 1)
      rem <- z[!(z %in% c(x_sel, y_sel))]
      z_sel <- sample(rem, 1)
      sel <- rbind(sel, cbind(x_sel, y_sel, z_sel))
   }
   return(sel)
}

f2 <- function() {
   set.seed(123)
   x <- 1:4
   y <- 5:8
   z <- c(x, y)
   trials <- 5000

   xval <- sample(x, size = trials, replace = TRUE)
   yval <- sample(y, size = trials, replace = TRUE)
   zval <-
      mapply(
         FUN = function(x, y, z) {
            sample(setdiff(z, c(x, y)), 1)
         },
         x = xval,
         y = yval,
         MoreArgs = list(z = z)
      )

   result <- data.frame(xval = xval,
                        yval = yval,
                        zval = zval)
   return(result)
}


f3 <- function() {
   x <- 1:4
   y <- 5:8
   z <- c(x, y) #edited - previous code read a, b in place of x, y
   trials <- 5000
   set.seed(123)
   x_sel <- sample(x, trials, replace = TRUE)
   y_sel <- sample(y, trials, replace = TRUE)
   z_mac <- matrix(z,
                   nrow = trials,
                   ncol = length(z),
                   byrow = TRUE)
   take <- z_mac != x_sel & z_mac != y_sel
   z_sel <- t(matrix(t(z_mac)[t(take)], ncol = trials))
   take <- sample(1:ncol(z_sel), size = trials, replace = TRUE)
   cbind(x_sel, y_sel, z_sel = z_sel[cbind(1:trials, take)])
}


microbenchmark(f1(), f2(), f3(), times = 10L)

Unit:milliseconds
expr         min          lq        mean      median          uq         max neval
f1() 2193.448113 2248.442450 2258.626023 2258.135072 2267.333956 2346.457082    10
f2()  205.124501  208.672947  213.520267  212.208095  219.397101  222.990083    10
f3()    2.463567    2.491762    2.570517    2.512588    2.603582    2.827863    10

My f3 function is 856 times faster than f1 and 83 times faster than f2. When we consider oryginal problem (trials=5) then

> microbenchmark(f1(), f2(), f3(), times = 10L)
Unit: microseconds
 expr      min       lq      mean    median       uq      max neval
 f1() 1215.924 1268.790 1296.7610 1300.5095 1321.015 1370.998    10
 f2()  587.937  590.500  619.6248  612.9285  638.881  687.261    10
 f3()   68.886   78.819   86.7652   81.2225   91.315  116.947    10

Upvotes: 1

joran
joran

Reputation: 173517

This should probably be somewhat faster, but I doubt it's the fastest possible. Certainly Rcpp would be the fastest, I would think.

> set.seed(123)
> x <- 1:4
> y <- 5:8
> z <- c(x, y)
> trials <- 5
> 
> xval <- sample(x,size = trials,replace = TRUE)
> yval <- sample(y,size = trials,replace = TRUE)
> zval <- mapply(FUN = function(x,y,z) {sample(setdiff(z,c(x,y)),1)},
                             x = xval,
                             y = yval,
                             MoreArgs = list(z = z))
> 
> result <- data.frame(xval = xval,
                                         yval = yval,
                                         zval = zval)
> result
  xval yval zval
1    2    5    8
2    4    7    3
3    2    8    6
4    4    7    5
5    4    6    1

At only 10k samples, this appears to be ~37x faster than your for loop (which was primarily inefficient because of the appending things one at a time onto sel, not anything inherent in the for loop). The difference between this and a more sensibly written for loop would likely be much less.

Upvotes: 4

Related Questions