Reputation: 664
I am trying to draw from two different distributions with a probability 100000 times. Unfortunately I can't see what is wrong with my for loop, however, it only adds 1 value to simulated_data
instead of the desired 100,000 values.
Question 1: How can I fix this?
Question 2: Is there a far more efficient method where I don't have to loop through 100,000 items in a list?
#creating a vector of probabilities
probabilities <- rep(0.99,100000)
#creating a vector of booleans
logicals <- runif(length(probabilities)) < probabilities
#empty list for my simulated data
simulated_data <- c()
#drawing from two different distributions depending on the value in logicals
for(i in logicals){
if (isTRUE(i)) {
simulated_data[i] <- rnorm(n = 1, mean = 0, sd = 1)
}else{
simulated_data[i] <- rnorm(n = 1, mean = 0, sd = 10)
}
}
Upvotes: 1
Views: 1116
Reputation: 263411
Create a vector with the desired fraction of values from each distribution and then create a random permutation of the values:
N = 10000
frac =0.99
rand_mix = sample( c( rnorm( frac*N, 0, sd=1) , rnorm( (1-frac)*N, 0, sd=10) ) )
> table( abs(rand_mix) >1.96)
FALSE TRUE
9364 636
> (100000-636)/100000
[1] 0.99364
> table( rnorm(10000) >6)
FALSE
10000
The fraction is fixed. If you wante a possibly random fraction (but close to 0.99 statistically) then try this:
> table( sample( c( rnorm(10e6), rnorm(10e4, sd=10) ), 10e4) > 1.96 )
FALSE TRUE
97151 2849
Compare with:
> N = 100000
> frac =0.99
> rand_mix = sample( c( rnorm( frac*N, 0, sd=1) , rnorm( (1-frac)*N, 0, sd=10) ) )
> table( rand_mix > 1.96 )
FALSE TRUE
97117 2883
Upvotes: 0
Reputation: 9313
It seems that you want to create a final sample where each element is taken randomly from either sample1 or sample2, with probabilities 0.99 and 0.01.
The correct approach would be to generate both samples, each containing the same number of elements and then select randomly from either one.
The correct approach would be:
# Generate both samples
n = 100000
sample1 = rnorm(n,0,1)
sample2 = rnorm(n,0,10)
# Create the logical vector that will decide whether to take from sample 1 or 2
s1_s2 = runif(n) < 0.99
# Create the final sample
sample = ifelse(s1_s2 , sample1, sample2)
In this case, it is not guaranteed that there are exactly 0.99*n samples from sample1 and 0.01*n from sample2. In fact:
> sum(sample == sample1)
[1] 98953
This is close to 0.99*n, as expected, but not exactly.
Upvotes: 2
Reputation: 664
Here is a nice solution for anyone here:
n <- 100000
prob1 <- 0.99
prob2 <- 1-prob1
dist1 <- rnorm(prob1*n, 0, 1)
dist2 <- rnorm(prob2*n, 0, 10)
actual_sample <- c(dist1, dist2)
Upvotes: 0