Reputation: 3
I am attempting to plot the mean-variance from bootstrapped data against sample size.
I have set up a loop that randomly selects n lines of the bootstrapped data (total n=1000), finds the variance, returns the variance and does this 10 times. However, I would instead rather like to do this for each value going from n=2 to n=1000. I.e. It calculates the variance of a sample of 2, then 3, then 4, then 5... all the way to the variance of 1000 samples, as currently I am changing the sample size one by one. How is the best way to do this? my current code is as follows:
set.seed(1)
bSamples <- 10
bResults <- rep(NA, bSamples)
for (b in seq_len(bSamples)) {
bRows<-baseRan[sample(nrow(baseRan), 5), ]
bValue <- var(bRows$Ranpopsize)
bResults[[b]] <- bValue }
BR = data.frame(bResults)
BR
Upvotes: 0
Views: 45
Reputation: 17204
(Edited to incorporate OP's request in comment.) You can adapt your loop to iterate over sample size and populate a dataframe with stats for each:
set.seed(1)
# example data
baseRan <- data.frame(id = 1:2000, Ranpopsize = rnorm(2000))
# initialize result dataframe
n_range <- 2:1000
BR <- data.frame(
n = n_range,
var = vector("double", length(n_range)),
mean = vector("double", length(n_range)),
ci95.lo = vector("double", length(n_range)),
ci95.hi = vector("double", length(n_range))
)
for (b in n_range) {
bRanpop <-baseRan[sample(nrow(baseRan), b), ]$Ranpopsize
bMean <- mean(bRanpop)
bConf <- (sd(bRanpop) / sqrt(b)) * 1.96
BR[b - 1, -1] <- c(var(bRanpop), bMean, bMean - bConf, bMean + bConf)
}
head(BR)
# n var mean ci95.lo ci95.hi
# 1 2 0.3568078 0.5214573 -0.3064052 1.3493199
# 2 3 3.2798144 0.1923345 -1.8570341 2.2417031
# 3 4 0.4607873 0.2721666 -0.3930703 0.9374034
# 4 5 0.9062647 -0.1553909 -0.9898376 0.6790558
# 5 6 1.3973509 0.5937734 -0.3521004 1.5396471
# 6 7 1.0208383 -0.1533831 -0.9018723 0.5951061
Plotting mean and 95% CI as a function of sample size:
library(ggplot2)
ggplot(BR, aes(n, mean)) +
geom_ribbon(
aes(ymin = ci95.lo, ymax = ci95.hi),
fill = "red",
alpha = .3
) +
geom_line(size = .2) +
ylab("Mean (+/- 95% CI)") +
theme_light()
Upvotes: 1