Reputation: 35
My aim is to randomly generate a vector of integers using R, which is populated by numbers between 1-8
. However, I want to keep growing the vector until all the numbers between 1:8 are represented at least once, e.g. 1,4,6,2,2,3,5,1,4,7,6,8
.
I am able to generate single numbers or a sequence of numbers using sample
x=sample(1:8,1, replace=T)
>x
[1] 6
I have played around with the repeat
function to see how it might work with sample
and I can at least get the generation to stop when one specific number occurs, e.g.
repeat {
print(x)
x = sample(1:8, 1, replace=T)
if (x == 3){
break
}
}
Which gives:
[1] 3
[1] 6
[1] 6
[1] 6
[1] 6
[1] 6
[1] 2
I am struggling now to work out how to stop number generation once all numbers between 1:8
are present. Additionally, I know that the above code is only printing the sequence as it is generated and not storing it as a vector. Any advice pointing me in the right direction would be really appreciated!
Upvotes: 1
Views: 956
Reputation: 32548
This is fine for 1:8
but might not always be a good idea.
foo = integer(0)
set.seed(42)
while(TRUE){
foo = c(foo, sample(1:8, 1))
if(all(1:8 %in% foo)) break
}
foo
# [1] 8 8 3 7 6 5 6 2 6 6 4 6 8 3 4 8 8 1
If you have more than 1:8
, it may be better to obtain the average number of tries (N) required to get all the numbers at least once and then sample N numbers such that all numbers are sampled at least once.
set.seed(42)
vec = 1:8
N = ceiling(sum(length(vec)/(1:length(vec))))
foo = sample(c(vec, sample(vec, N - length(vec), TRUE)))
foo
# [1] 3 6 8 3 8 8 6 4 5 6 1 6 4 6 6 3 5 7 2 2 7 8
Upvotes: 2
Reputation: 160417
Taking cue off of d.b, here's a slightly more verbose method that is more memory-efficient (and a little faster too, though I doubt speed is your issue):
Differences:
n-1
numbers have already been tabulated, no need to do that again)Code:
microbenchmark(
r2evans = {
emptyvec100 <- integer(100)
counter <- 0
out <- integer(0)
unseen <- seq_len(n)
set.seed(42)
repeat {
if (counter %% 100 == 0) out <- c(out, emptyvec100)
counter <- counter+1
num <- sample(n, size=1)
unseen <- unseen[unseen != num]
out[counter] <- num
if (!length(unseen)) break
}
out <- out[1:counter]
},
d.b = {
foo = integer(0)
set.seed(42)
while(TRUE){
foo = c(foo, sample(1:n, 1))
if(all(1:n %in% foo)) break
}
}, times = 100, unit = 'us')
# Unit: microseconds
# expr min lq mean median uq max neval
# r2evans 1090.007 1184.639 1411.531 1228.947 1320.845 11344.24 1000
# d.b 1242.440 1372.264 1835.974 1441.916 1597.267 14592.74 1000
(This is intended neither as code-golf nor speed-optimization. My primary goal is to argue against extend-by-one vector work, and suggest a more efficient comparison technique.)
As d.b further suggested, this works fine for 1:8
but may run into trouble with larger numbers. If we extend n
up:
(Edit: with d.b's code changes, the execution times are much closer, and not nearly as exponential looking. Apparently the removal of unique
had significant benefits to his code.)
Upvotes: 1