Resampling with a loop in R

Question

Consider the following data:

library(Benchmarking)
d <- data.frame(x1=c(100,200,30,500), x2=c(300,200,10,50), y=c(75,100,3000,400))

So I have 4 observations.

Now I want to select 2 observations randomly out of d two times (without repetition). For each of these two times I want to calculate the following:

e <- dea(d[c('x1', 'x2')], d$y)
weighted.mean(eff(e), d$y)

That is, I will get two numbers, which I want to calculate an average of. Can someone show how to do this with a loop function in R?

Example:

Consider that observation 1 and 3 was selected the first time, and 2 and 3 was selected the second time (of course, this could be different). This will give me the following results:

0.9829268 0.9725806

Since (here I have written the observations manually):

> d1 <- data.frame(x1=c(100,30), x2=c(300,10), y=c(75,3000))
> e1 <- dea(d1[c('x1', 'x2')], d1$y)
> weighted.mean(eff(e1), d1$y)
[1] 0.9829268
> 
> d2 <- data.frame(x1=c(200,30), x2=c(200,10), y=c(100,3000))
> e2 <- dea(d2[c('x1', 'x2')], d2$y)
> weighted.mean(eff(e2), d2$y)
[1] 0.9725806

And the mean of these two numbers is:

0.9777537

My suggestion:

I have tried with:

for (r in 1:2)
{
  a <- (1:4)
  s <- sample(a, 2, replace = FALSE)

  es <- dea([s, c('x1', 'x2')], y[s])
  esav[i] <- weighted.mean(eff(es), y[s])
}
mean(esav)

But this does not work. Can someone help me?

digEmAll · Accepted Answer

Here's a possible approach (if I understood you correctly) :

library(Benchmarking)

set.seed(123) # just to reproduce this case

d <- data.frame(x1=c(100,200,30,500), x2=c(300,200,10,50), y=c(75,100,3000,400))
# generate all possible couples of row indexes
allPossibleRowIndexes <- combn(1:nrow(d),2,simplify=FALSE)
# select the first maxcomb couples randomly (without repetition)
maxcomb <- 3 # I chose 3... you can also test all the possibilities
rowIndexesRand <- sample(allPossibleRowIndexes,min(maxcomb,length(allPossibleRowIndexes)))

esav <- NULL
for (rowIdxs in rowIndexesRand){
  es <- dea(d[rowIdxs, c('x1', 'x2')], d$y[rowIdxs])
  esav <- c(esav,weighted.mean(eff(es), d$y[rowIdxs]))
}
avg <- mean(esav)

# or alternatively using sapply instead of loop
avg <- mean(sapply(rowIndexesRand,function(rowIdxs){
  es <- dea(d[rowIdxs, c('x1', 'x2')], d$y[rowIdxs])
  esav <- weighted.mean(eff(es), d$y[rowIdxs])
  return(esav)
}))

Results :

> esav
[1] 0.9829268 0.9725806 0.9058824
> avg
[1] 0.9537966
> rowIndexesRand
[[1]]
[1] 1 3

[[2]]
[1] 2 3

[[3]]
[1] 3 4

EDIT :

As per comment, you can generate unique random indexes without generating all combinations using the following function.
Of course this is not very efficient since it samples multiple times in case the combination has been already extracted before...

# function that (not very efficiently) returns n unique random samples 
# of size=k, taken from the set : 1...size
getRandomSamples <- function(size,k,n){
 # ensure n is <= than the number of combinations
  n <- min(n,choose(size,k))
  env <- new.env()
  for(i in seq_len(n)){
    # sample until it's not a duplicate
    while(TRUE){
      set <- sort(sample.int(size,k))
      key <- paste(set,collapse=',')
      if(is.null(env[[key]])){
        env[[key]] <- set
        break
      }
    }
  }
  unname(as.list(env))
}

# usage example
set.seed(1234) # for reproducibility
getRandomSamples(60,36,5)
[[1]]
 [1]  1  2  4  7  8 10 11 12 13 14 15 16 17 18 20 21 22 23 24 26 30 31 32 33 34 35 36 37 42 43 44 46 47 55 58 59

[[2]]
 [1]  3  4  5  8 10 11 12 13 14 16 17 18 19 20 22 23 24 25 26 29 32 33 35 38 40 43 44 45 47 48 49 50 51 55 56 58

[[3]]
 [1]  1  2  4  5  6  7  8  9 10 11 14 18 19 22 25 27 28 30 36 37 38 39 40 43 46 47 49 50 51 53 54 55 57 58 59 60

[[4]]
 [1]  1  2  5  7  8  9 10 12 13 14 18 19 27 29 30 31 35 36 37 38 42 43 44 46 47 48 49 51 52 53 55 56 57 58 59 60

[[5]]
 [1]  3  5  6  7  9 11 12 13 15 16 19 20 21 22 24 26 27 30 31 32 35 36 37 39 40 42 43 44 45 46 49 50 51 54 55 60

Resampling with a loop in R

Answers (1)

Related Questions