PeterLL
PeterLL

Reputation: 3

Increasing sample size by 1 from previous loop?

I am attempting to plot the mean-variance from bootstrapped data against sample size.

I have set up a loop that randomly selects n lines of the bootstrapped data (total n=1000), finds the variance, returns the variance and does this 10 times. However, I would instead rather like to do this for each value going from n=2 to n=1000. I.e. It calculates the variance of a sample of 2, then 3, then 4, then 5... all the way to the variance of 1000 samples, as currently I am changing the sample size one by one. How is the best way to do this? my current code is as follows:

set.seed(1)
bSamples <- 10
bResults <- rep(NA, bSamples)


for (b in seq_len(bSamples)) { 
  bRows<-baseRan[sample(nrow(baseRan), 5), ]
  bValue <- var(bRows$Ranpopsize)
  bResults[[b]] <- bValue }

BR = data.frame(bResults)
BR

Upvotes: 0

Views: 45

Answers (1)

zephryl
zephryl

Reputation: 17204

(Edited to incorporate OP's request in comment.) You can adapt your loop to iterate over sample size and populate a dataframe with stats for each:

set.seed(1)

# example data
baseRan <- data.frame(id = 1:2000, Ranpopsize = rnorm(2000))  

# initialize result dataframe
n_range <- 2:1000
BR <- data.frame(
  n = n_range,
  var = vector("double", length(n_range)),
  mean = vector("double", length(n_range)),
  ci95.lo = vector("double", length(n_range)),
  ci95.hi = vector("double", length(n_range))
)

for (b in n_range) { 
  bRanpop <-baseRan[sample(nrow(baseRan), b), ]$Ranpopsize
  bMean <- mean(bRanpop)
  bConf <- (sd(bRanpop) / sqrt(b)) * 1.96
  BR[b - 1, -1] <- c(var(bRanpop), bMean, bMean - bConf, bMean + bConf)
}

head(BR)
#   n       var       mean    ci95.lo   ci95.hi
# 1 2 0.3568078  0.5214573 -0.3064052 1.3493199
# 2 3 3.2798144  0.1923345 -1.8570341 2.2417031
# 3 4 0.4607873  0.2721666 -0.3930703 0.9374034
# 4 5 0.9062647 -0.1553909 -0.9898376 0.6790558
# 5 6 1.3973509  0.5937734 -0.3521004 1.5396471
# 6 7 1.0208383 -0.1533831 -0.9018723 0.5951061

Plotting mean and 95% CI as a function of sample size:

library(ggplot2)

ggplot(BR, aes(n, mean)) +
  geom_ribbon(
    aes(ymin = ci95.lo, ymax = ci95.hi),
    fill = "red",
    alpha = .3
  ) +
  geom_line(size = .2) +
  ylab("Mean (+/- 95% CI)") +
  theme_light()

Upvotes: 1

Related Questions