Reputation: 157
I am new to simulation exercises in R. I want to create 1000 samples of size 25 from a t distribution with degrees of freedom 10.
Do I need to create a single vector of data from the rt
generator, and then sample repeatedly from that? So, for example, I could create the vector:
singlevector <- rt(5000, 10)
, which generates data from a t-distribution of size 5000 and df = 10. So, I would treat this as my population and then sample from it. I chose the population size of 5000 arbitrarily here.
OR, should I create my 1000 samples calling on this random t generator every time?
In other words, create a matrix with 25 rows and 1000 columns, each column containing vector corresponding to a new call of rt(25, 10)
.
Upvotes: 3
Views: 1956
Reputation: 226057
Since you are sampling independent, identically distributed values, all three of these approaches are statistically equivalent.
The latter two are not just statistically but computationally equivalent. In the first approach, the order of samples gets scrambled, but that makes no difference to the statistical properties.
Approach #1:
set.seed(101)
x1 <- rt(25000,10)
r1 <- do.call(cbind,split(x1,sample(0:24999) %/% 25))
Illustrating the equivalence of #2 and #3:
set.seed(101)
r2 <- replicate(1000, rt(25, 10))
set.seed(101)
r3 <- matrix(rt(25000,10),nrow=25)
identical(r2,r3) ## TRUE
In general solution #3 is fastest (but all of these approaches are very fast for problems of this order of magnitude, i.e. approx 5 milliseconds (#3) vs 10 milliseconds (#2) for 25 x 1000 samples on my laptop); I would pick whichever approach is easiest for you to understand when you read the code.
Upvotes: 3