Reputation: 27
I have a question when I try to replicate the results, I come up with a problem.
l <- {}
for(i in 1:3){
set.seed(1)
l[i] <- rnorm(n = 1, i, i)
}
this will produce
0.3735462 0.7470924 1.1206386
However, if I write
set.seed(1)
rnorm(n = 3, 1:3, 1:3)
0.3735462 2.3672866 0.4931142
Or
set.seed(1)
rmvnorm(n = 1, 1:3, sqrt(diag(1:3)))
0.3735462 2.21839 1.900251
I don't get the same result. What can a be a problem? My goal is to vectorize the for loop, that's why I come up with a problem.
UPDATE
The answer below explains, how it works for rnorm and should work for all random number generators in R, however when I try this approach with rgig (Generalized Inverse Gaussian Distribution) I have again a problem.
l <- {}
for(i in 1:3){
set.seed(1)
l[i] <- rgig(n = i, i, i, i)[i]
}
1.629091 1.500733 1.564364
and if I use
set.seed(1)
rgig(n = 3, 1:3, 1:3, 1:3)
1.629091 1.440166 3.264135
When I use
sapply(1:3,function(x){set.seed(1);rgig(x,x,x,x)})
It doesn't show similar pattern as for rnorm. My assumption that it rgig doesn't support vectorization, since if we write:
set.seed(1)
rgig(n = 3, 1, 1, 1)
1.629091 1.440166 3.264135
What is the same as for vectorized. Am I right?
Upvotes: 2
Views: 642
Reputation: 79188
For the first method
l <- {}
for(i in 1:3){
set.seed(1)
l[i] <- rnorm(n = 1, i, i)
}
0.3735462 0.7470924 1.1206386
for the second method
set.seed(1)
rnorm(n = 3, 1:3, 1:3)
0.3735462 2.3672866 0.4931142
Your question is why are the two methods not producing the same results?.
Well to answer this i would first say before hand that the two methods DO PRODUCE CONSISTENT RESULTS. now lets see why the values from the pseudorandom generation are different. the simplet way is run a for-loop to see what happens:
sapply(1:3,function(x){set.seed(1);rnorm(x,x,x)})
[[1]]
[1] 0.3735462 #One number produced from mu=1 sd=1
[[2]]
[1] 0.7470924 2.3672866 # Two numbers produced from mu=2 sd=2
[[3]]
[1] 1.1206386 3.5509300 0.4931142 # Three numbers produced from mu=3 sd=3
Now if you look at this list, you will notice that the for loop is just taking the first numbers while the second method just takes the last numbers produced. That is why the numbers are different But in the end, the result is consistent since as you can see, both numbers are produced by the same mean and sd
Thus
set.seed(1)
rnorm(3,1:3,1:3)
is equivalent to
l <- {}
for(i in 1:3){
set.seed(1)
l[i] <- rnorm(n = i, i, i)[i]
}
l
[1] 0.3735462 2.3672866 0.4931142
rnorm(3,1:3,1:3)
[1] 0.3735462 2.3672866 0.4931142
Upvotes: 0
Reputation: 6132
With your loop, you do this:
set.seed(1)
rnorm(n = 1, 1, 1)
set.seed(1)
rnorm(n = 1, 2, 2)
set.seed(1)
rnorm(n = 1, 3, 3)
With your 2nd line of code you do this:
set.seed(1)
rnorm(3, 1:3, 1:3)
Hence the different results. In other words: with the loop you do set.seed(1) and randomly pick 1 number 3 times, at first you draw a number from a distribution with a mean and sd of 1, then from a mean and sd of 2 for the 2nd and at last from a mean and sd of 3 for the 3rd.
With the other you sample 3 numbers directly from a vector of means and sd's consisting of 1, 2 and 3. Then the seed was used for one line of code in which all three 3 numbers were generated.
If you would have liked to get the same results with your for loop, you would have needed this code:
> set.seed(1)
> rnorm(n = 1, 1, 1)
[1] 0.3735462
> set.seed(1)
> rnorm(n = 1, 2, 2)
[1] 0.7470924
> set.seed(1)
> rnorm(n = 1, 3, 3)
[1] 1.120639
Upvotes: 2