terminallyConfused
terminallyConfused

Reputation: 23

Simulating averages of normally-distributed random variables in R

I'm trying to simulate some data in R to check my manual calculations of how variance changes in a simple model that involves a sequence of normally-distributed random variables being averaged. However, I find I'm getting results that are not only inconsistent with my manual calculations, but inconsistent with each other. Clearly I'm doing something wrong, but I'm having trouble isolating the problem(s).

Conceptually, the model involves two steps: First, storing a variable, and second, using the stored variable(s) to produce an output. The output is then stored as a new variable, contributing to future outputs, and so on. I assume storage is noisy (i.e., what's stored is a random variable rather than a constant), but that no further noise is added in output production, which simply involves averaging the existing stored variables. Thus, my model involves the following steps, where V_i is the variable stored at step i, and O_i is the ith output:

image of model equations

and so on.

I've tried simulating this in R in two ways: First,

nSamples <- 100000
o1 <- rnorm(nSamples) # First output
o2 <- rowMeans(cbind(rnorm(nSamples, mean=o1),rnorm(nSamples))) # Second output, averaged from first two stored variables.
o3 <- rowMeans(cbind(rnorm(nSamples, mean=o2),rnorm(nSamples, mean=o1),rnorm(nSamples))) # Third output, averaged from first three stored variables.

This gives me

var(o1) # Approximately 1, as per my manual calculations.
var(o2) # Approximately .75, as per my manual calculations.
var(o3) # Approximately .64, where my manual calculations give 19/36 or approximately .528.

Initially, I trusted the code and assumed my calculations were wrong. Then, I tried the following alternative code, which more explicitly follows the steps I used manually:

nSamples <- 100000
initialValue <- 0
v1 <- rnorm(nSamples, initialValue)
o1 <- v1
v2 <- rnorm(nSamples, o1)
o2 <- rowMeans(cbind(v1, v2))
v3 <- rnorm(nSamples, o2)
o3 <- rowMeans(cbind(v1, v2, v3))

This gives me

var(o1) # Approximately 1, as per my calculations.
var(o2) # Approximately 1.25, where my calculations give .75.
var(o3) # Approximately 1.36, where my calculations give approximately .528.

Thus, clearly I have done something wrong in using at least two of these three methods, but I'm having trouble isolating the source of the problems. What is it I'm missing that is leading my code to behave differently than what I'm expecting? And what is the difference between the two code examples that leads the variance to decrease in one and increase in the other?

Upvotes: 2

Views: 113

Answers (1)

mastropi
mastropi

Reputation: 1414

Your correct calculation is the first one, where you are generating new realizations of the normal random variable when averaging, as opposed to using the realizations generated in the previous step.

In fact, the distribution of O2 assumes that the two normal random variables being averaged are mutually independent.

In your second calculation, this is not true, as you are averaging v1 and v2, which are not independent since both depend on o1. This is why you get larger variances in the second case.

Upvotes: 1

Related Questions