Reputation: 19
I want to create random sequences for the variables a, b, c, d, e and f with the length of 6000 under specific conditions.
I want to randomly draw from a discrete uniform distribution between 10 and 40 for every sequence, but under the following condition:
a = f < (a+b)/2 < e < c < b < d
Does anyone know how I would code that?
Upvotes: 1
Views: 78
Reputation: 51998
The conditions are somewhat ad-hoc. A hit and miss approach which draws random vectors until the conditions are satisfied could work (though it might not be optimal). Something like:
randvect <- function(){
v <- sample(10:40,5)
while(any(c(v[1] >= v[2],
mean(v[1:2]) >= v[5],
v[5] >= v[3],
v[3] >= v[2],
v[2] >= v[4]))){
v <- sample(10:40,5)
}
v
}
For example,
> randvect()
[1] 16 26 25 36 23
(I don't bother with f
since it is the same as a
).
To get 6000:
vects <- replicate(6000,randvect())
With all the misses in the hit and miss, that takes about 30 seconds to evaluate on my machine.
Upvotes: 2
Reputation: 3230
John Coleman's answer will get there, and is may be a better way to randomly sample, but could potentially take a long time depending on what your allowable space is.
Another option to figure out the allowable space, and sample starting with a.
a + 4 < b < min((37 * 2) - a, 39)
The rest are a bit more straightforward. These can be wrapped into a function.
I'm going to use data.table
more for looking at the results at the end. Also I'm using the function resample
described in help(sample)
to handle cases where there is only a single value to sample.
library(data.table)
resample <- function(x, ...) x[sample.int(length(x), ...)]
funky <- function() {
a <- resample(10:34, 1)
f <- a
b <- resample((a + 5):min(((37 * 2) - a + 1), 39), 1)
e <- resample(ceiling((a+b)/2 + 0.1):min(38, b - 2), 1)
c <- resample((e + 1):(b - 1), 1)
d <- resample((b + 1):40, 1)
c(a, b, c, d, e, f)
}
A few issues found by trial and error. In e, the 0.1 is added so that if the average is currently an integer, it gets increased by 1, but if the value is X.5 it will get rounded up to X + 1.
dat <- data.table(t(replicate(10000, funky())))
setnames(dat, c("a", "b", "c", "d", "e", "f"))
The following will return all rows that fail the tests in the original question. A few iterations with 10k samples and it doesn't look like anything is failing.
dat[!(a == f &
f < ((a + b) / 2) &
((a + b) / 2) < e &
e < c &
c < b &
b < d)]
Upvotes: 0
Reputation: 3591
This question isn’t really well defined, as there are different implementations that result in different distributions. For instance, taking the condition b=d. The latter is the most natural interpretation, but the most computationally expensive. You can improve it by randomly taking b and d, and then if b > d, then switch b and d. I think this logic can be extended to e,c,b,d: randomly choose four numbers between 10 and 40, then assign e to be the smallest, c the second smallest, etc. I think this will produce the same distribution as the “throw out” method, but I’m not sure. So to get e,c,b, and d:
numbers = sort(sample(10:40,4,replace = TRUE))
e = numbers[1]
c = numbers[2]
b = numbers[3]
d = numbers[4]
I'm still thinking about what to do with a, however.
Upvotes: 0