Reputation: 119
I want to pick one element from x, one element from y (x and y are mutually exclusive), and one element from x or y that has not already been selected. I then want to repeat the process a specified number of times and store the results of each trial in a dataframe. (note: I am not interested in trying to find every possible combination)
The code below works but slows considerably as the number of trials increases.
x <- 1:4
y <- 5:8
z <- c(x, y) #edited - previous code read a, b in place of x, y
trials <- 5
sel <- data.frame()
set.seed(123)
for (i in 1:trials){
x_sel <- sample(x, 1)
y_sel <- sample(y, 1)
rem <- z[!(z %in% c(x_sel, y_sel))]
z_sel <- sample(rem, 1)
sel <- rbind(sel, cbind(x_sel, y_sel, z_sel))
}
Upvotes: 1
Views: 90
Reputation: 11
My approach is not elegant but seems to be efficient when a number of trials is large. In order to prove it I created 3 function: f1 - yours, f2 - joran's, f3 - mine
library(microbenchmark)
f1 <- function() {
x <- 1:4
y <- 5:8
z <- c(x, y) #edited - previous code read a, b in place of x, y
trials <- 5000
sel <- data.frame()
set.seed(123)
for (i in 1:trials) {
x_sel <- sample(x, 1)
y_sel <- sample(y, 1)
rem <- z[!(z %in% c(x_sel, y_sel))]
z_sel <- sample(rem, 1)
sel <- rbind(sel, cbind(x_sel, y_sel, z_sel))
}
return(sel)
}
f2 <- function() {
set.seed(123)
x <- 1:4
y <- 5:8
z <- c(x, y)
trials <- 5000
xval <- sample(x, size = trials, replace = TRUE)
yval <- sample(y, size = trials, replace = TRUE)
zval <-
mapply(
FUN = function(x, y, z) {
sample(setdiff(z, c(x, y)), 1)
},
x = xval,
y = yval,
MoreArgs = list(z = z)
)
result <- data.frame(xval = xval,
yval = yval,
zval = zval)
return(result)
}
f3 <- function() {
x <- 1:4
y <- 5:8
z <- c(x, y) #edited - previous code read a, b in place of x, y
trials <- 5000
set.seed(123)
x_sel <- sample(x, trials, replace = TRUE)
y_sel <- sample(y, trials, replace = TRUE)
z_mac <- matrix(z,
nrow = trials,
ncol = length(z),
byrow = TRUE)
take <- z_mac != x_sel & z_mac != y_sel
z_sel <- t(matrix(t(z_mac)[t(take)], ncol = trials))
take <- sample(1:ncol(z_sel), size = trials, replace = TRUE)
cbind(x_sel, y_sel, z_sel = z_sel[cbind(1:trials, take)])
}
microbenchmark(f1(), f2(), f3(), times = 10L)
Unit:milliseconds
expr min lq mean median uq max neval
f1() 2193.448113 2248.442450 2258.626023 2258.135072 2267.333956 2346.457082 10
f2() 205.124501 208.672947 213.520267 212.208095 219.397101 222.990083 10
f3() 2.463567 2.491762 2.570517 2.512588 2.603582 2.827863 10
My f3 function is 856 times faster than f1 and 83 times faster than f2. When we consider oryginal problem (trials=5) then
> microbenchmark(f1(), f2(), f3(), times = 10L)
Unit: microseconds
expr min lq mean median uq max neval
f1() 1215.924 1268.790 1296.7610 1300.5095 1321.015 1370.998 10
f2() 587.937 590.500 619.6248 612.9285 638.881 687.261 10
f3() 68.886 78.819 86.7652 81.2225 91.315 116.947 10
Upvotes: 1
Reputation: 173517
This should probably be somewhat faster, but I doubt it's the fastest possible. Certainly Rcpp would be the fastest, I would think.
> set.seed(123)
> x <- 1:4
> y <- 5:8
> z <- c(x, y)
> trials <- 5
>
> xval <- sample(x,size = trials,replace = TRUE)
> yval <- sample(y,size = trials,replace = TRUE)
> zval <- mapply(FUN = function(x,y,z) {sample(setdiff(z,c(x,y)),1)},
x = xval,
y = yval,
MoreArgs = list(z = z))
>
> result <- data.frame(xval = xval,
yval = yval,
zval = zval)
> result
xval yval zval
1 2 5 8
2 4 7 3
3 2 8 6
4 4 7 5
5 4 6 1
At only 10k samples, this appears to be ~37x faster than your for loop (which was primarily inefficient because of the appending things one at a time onto sel
, not anything inherent in the for
loop). The difference between this and a more sensibly written for
loop would likely be much less.
Upvotes: 4