Malte Dvl
Malte Dvl

Reputation: 19

Create conditional random sequence

I want to create random sequences for the variables a, b, c, d, e and f with the length of 6000 under specific conditions.

I want to randomly draw from a discrete uniform distribution between 10 and 40 for every sequence, but under the following condition:

a = f < (a+b)/2 < e < c < b < d

Does anyone know how I would code that?

Upvotes: 1

Views: 78

Answers (3)

John Coleman
John Coleman

Reputation: 51998

The conditions are somewhat ad-hoc. A hit and miss approach which draws random vectors until the conditions are satisfied could work (though it might not be optimal). Something like:

randvect <- function(){
  v <- sample(10:40,5)
  while(any(c(v[1] >= v[2],
              mean(v[1:2]) >= v[5],
              v[5] >= v[3],
              v[3] >= v[2],
              v[2] >= v[4]))){
    v <- sample(10:40,5)
  }
  v
}

For example,

> randvect()
[1] 16 26 25 36 23

(I don't bother with f since it is the same as a).

To get 6000:

vects <- replicate(6000,randvect()) 

With all the misses in the hit and miss, that takes about 30 seconds to evaluate on my machine.

Upvotes: 2

Eric Watt
Eric Watt

Reputation: 3230

John Coleman's answer will get there, and is may be a better way to randomly sample, but could potentially take a long time depending on what your allowable space is.

Another option to figure out the allowable space, and sample starting with a.

  • a has to be between 10 and 34 (to leave room for e, c, b, and d)
  • the average of a and b has to be =< (b - 2) and < 37. This means b has to be 5 or more than a, and less than 39
    • a + 4 < b < min((37 * 2) - a, 39)

The rest are a bit more straightforward. These can be wrapped into a function.

I'm going to use data.table more for looking at the results at the end. Also I'm using the function resample described in help(sample) to handle cases where there is only a single value to sample.

library(data.table)

resample <- function(x, ...) x[sample.int(length(x), ...)]

funky <- function() {

  a <- resample(10:34, 1)
  f <- a
  b <- resample((a + 5):min(((37 * 2) - a + 1), 39), 1)
  e <- resample(ceiling((a+b)/2 + 0.1):min(38, b - 2), 1)
  c <- resample((e + 1):(b - 1), 1)
  d <- resample((b + 1):40, 1)

  c(a, b, c, d, e, f)

}

A few issues found by trial and error. In e, the 0.1 is added so that if the average is currently an integer, it gets increased by 1, but if the value is X.5 it will get rounded up to X + 1.

dat <- data.table(t(replicate(10000, funky())))
setnames(dat, c("a", "b", "c", "d", "e", "f"))

The following will return all rows that fail the tests in the original question. A few iterations with 10k samples and it doesn't look like anything is failing.

dat[!(a == f &  
        f < ((a + b) / 2) & 
        ((a + b) / 2) < e &
        e < c &
        c < b &
        b < d)]

Upvotes: 0

Acccumulation
Acccumulation

Reputation: 3591

This question isn’t really well defined, as there are different implementations that result in different distributions. For instance, taking the condition b=d. The latter is the most natural interpretation, but the most computationally expensive. You can improve it by randomly taking b and d, and then if b > d, then switch b and d. I think this logic can be extended to e,c,b,d: randomly choose four numbers between 10 and 40, then assign e to be the smallest, c the second smallest, etc. I think this will produce the same distribution as the “throw out” method, but I’m not sure. So to get e,c,b, and d:

numbers = sort(sample(10:40,4,replace = TRUE))
e = numbers[1]
c = numbers[2]
b = numbers[3]
d = numbers[4]

I'm still thinking about what to do with a, however.

Upvotes: 0

Related Questions