LotsofQuestions
LotsofQuestions

Reputation: 157

simulating the t -distributions -- random samples

I am new to simulation exercises in R. I want to create 1000 samples of size 25 from a t distribution with degrees of freedom 10.

Do I need to create a single vector of data from the rt generator, and then sample repeatedly from that? So, for example, I could create the vector:

singlevector <- rt(5000, 10) , which generates data from a t-distribution of size 5000 and df = 10. So, I would treat this as my population and then sample from it. I chose the population size of 5000 arbitrarily here.

OR, should I create my 1000 samples calling on this random t generator every time?

In other words, create a matrix with 25 rows and 1000 columns, each column containing vector corresponding to a new call of rt(25, 10).

Upvotes: 3

Views: 1956

Answers (1)

Ben Bolker
Ben Bolker

Reputation: 226057

Since you are sampling independent, identically distributed values, all three of these approaches are statistically equivalent.

  • call the random number generator once to get as many (or more) values than you need, then sample that vector without replacement
  • call the random number generator 1000 times, picking 25 values each time
  • call the random number generator once, picking 25000 values, then subdivide the vector into individual samples in order (rather than randomly)

The latter two are not just statistically but computationally equivalent. In the first approach, the order of samples gets scrambled, but that makes no difference to the statistical properties.

Approach #1:

set.seed(101)
x1 <- rt(25000,10)
r1 <- do.call(cbind,split(x1,sample(0:24999) %/% 25))

Illustrating the equivalence of #2 and #3:

set.seed(101)
r2 <- replicate(1000, rt(25, 10))
set.seed(101)
r3 <- matrix(rt(25000,10),nrow=25)
identical(r2,r3)  ## TRUE

In general solution #3 is fastest (but all of these approaches are very fast for problems of this order of magnitude, i.e. approx 5 milliseconds (#3) vs 10 milliseconds (#2) for 25 x 1000 samples on my laptop); I would pick whichever approach is easiest for you to understand when you read the code.

Upvotes: 3

Related Questions