Stefano Lombardi
Stefano Lombardi

Reputation: 1591

R: running standard deviations

I need to compute the "running" standard deviations of a vector of means, computing them as if the sample size is increasing. In other words, I need to compute the sd of mean_y[1], of mean_y[1] and mean_y[2], of mean_y[1], mean_y[2], mean_y[3] and so on,

where:

y <- rnorm(10000, mean=5, sd=2) 
i <- seq(1 : length(y))

mean_y <- cumsum(y)/i

Trying to apply the same criterion (hence, not explicitly using a loop), I produced the following code for the standard deviations of the vector of running means:

se <- sqrt(1/i^2*cumsum(y^2) - 1/i*mean_y^2)

This beacuse, var(mean(x)) = 1/n*var(x). To me, the code seems ok. But when I plot the runnning means with their confidence intervals against i (the increasing sample size), the 95% bands perfectly coincide with the means!

The code is:

error <- qnorm(0.975)*se/sqrt(i)
lower <- mean_y - error
upper <- mean_y + error

# plotting means and ci's against sample size (= up to 10000)

plot(x=i, y=mean_y, xlab="Number of iterations (sample size)", 
ylab="E[y] estimates and 95% CI's", cex=0.4, ylim=c(4.6, 5.4))
lines(lower, col="gold")
lines(upper, col="gold")

The rationale is to produce a graph showing the convergence of the estimator "mean_y" as the sample size keeps increasing.

Can anyone help me? Probably there is some kind of basic error, either in the se formula or in lower and upper. Thanks!! Stefano

Upvotes: 1

Views: 694

Answers (1)

Stefano Lombardi
Stefano Lombardi

Reputation: 1591

it turned out that I was erroneously dividing by i. Also, by not considering the "n-1" correction for the variance, my formula was not comparable with the one suggested by Carl Witthoft.

In the following lines you find three equivalent ways for solving the initial problem and plotting the same graph:

i <- seq(1 : length(y))
m <- cumsum(y)/i

se_y <- sqrt((1/(i-1)*cumsum(y^2) - i/(i-1)*m^2))

error <- qnorm(0.975)*se_y/sqrt(i)
lower <- m - error
upper <- m + error

# equivalent (slightly slower) methods for getting the std. errors

# method2:
se_2 <- rep(NA, length(y))
for (n in 1:length(y))  {
  se_2[n] <- sd(y[1:n])
}
# method3:
se_3 <- sapply(1:length(y), FUN= function(x) sd(y[1:x]))

The final graph is:

# plotting means and ci's against sample size (= up to 10000)
plot(x=i, y=m, xlab="Number of iterations (sample size)", 
title("Convergence of the ENVP's mean"), 
ylab="E[y] estimates and 95% CI's (EUR millions)", cex=0.4, ylim=c(2620, 2665))
lines(lower, col="gold")
lines(upper, col="gold")
legend("bottomright", legend=c("envp's mean", "95% ci"),
cex=0.8, col=c("black", "gold"), lwd=2, lty=1, bty="n")

dev.copy(tiff, file="mc_envp.tiff", height=6, width=6, units="in", res=200)
dev.off(); dev.off()
windows.options(reset=TRUE)

Hope all this will help!

Upvotes: 1

Related Questions