Mean Confidence Interval of 400 random samples

Question

Hi I have a question about R.

Actually I have a population of 200 employees and i know the mean and sd of for the whole population (working hours).

The following must be repeated 400 times:

1) Collect small random sample of 6 people in the population.

2) Construct a 90% level confidence interval for mean (μ) (assume that the population size is infinite)

3) Among the 400 confidence intervals constructed in 2), how many do not contain the value of mean (μ) of the whole population.

I collected sample and all but i am unable to build confidence intervals.

Here is what i have done so far:

> population<-data$hours01
> n<-6
> Vect <- rep(0,400)
> for(i in 1:400){
+ ech <- sample(population,n)
+ right[i]<-(mean(ech)) + 1.645*(((sd(ech))/sqrt(n)))
+ left[i]<-(mean(ech)) - 1.645*(((sd(ech))/sqrt(n)))

Here are the Data

alistaire · Accepted Answer

You can build a function to calculate the confidence interval, and then apply it to samples with replicate to generate a matrix of confidence intervals, which you can check against the population mean.

There is a possible complication: when standard deviation is unknown, confidence intervals are calculated with the t distribution, but if it is, the cumulative normal is used. If the degrees of freedom is relatively large, it will make very little difference, but given that it will be only 5 for each sample, the difference matters here.

Thus, to build a robust function for the confidence interval, you would need something like

ci <- function(x, conf.level, sd = NULL){
    conf.level <- mean(c(conf.level, 1))
    mean.x <- mean(x)
    if (is.null(sd)) {    # when standard deviation unknown,
        sd <- sd(x)    # use sample standard deviation
        z <- qt(conf.level, length(x) - 1)    # and t distribution
    } else {
        z <- qnorm(conf.level)    # when known, use normal
    }
    int <- z * sd / sqrt(length(x))
    c(low = mean.x - int, 
      high = mean.x + int)
}

To try it out,

set.seed(47)    # make sampling reproducible

# make a matrix of confidence intervals
ints <- replicate(400, ci(sample(heur01, 6), .9, sd(heur01)))

ints[, 1:5]
#>          [,1]     [,2]     [,3]     [,4]     [,5]
#> low  1443.959 1441.625 1376.459 1486.625 1436.959
#> high 1865.041 1862.708 1797.541 1907.708 1858.041

# calculate number of intervals that don't contain mean
mean.x <- mean(heur01)
sum(mean.x < ints[1,] | mean.x > ints[2,])
#> [1] 37

To see that it is, in fact, different when standard deviation isn't specified,

set.seed(47)
with_sd <- replicate(100, {
    ints <- replicate(400, ci(sample(heur01, 6), .9, sd(heur01)))
    sum(mean.x < ints[1,] | mean.x > ints[2,])
})
summary(with_sd)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>    27.0    34.0    37.0    37.5    41.0    50.0

set.seed(47)
no_sd <- replicate(100, {
    ints <- replicate(400, ci(sample(heur01, 6), .9))
    sum(mean.x < ints[1,] | mean.x > ints[2,])
})
summary(no_sd)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>   29.00   43.00   46.00   47.07   52.00   66.00

t.test(with_sd, no_sd)
#> 
#>  Welch Two Sample t-test
#> 
#> data:  with_sd and no_sd
#> t = -11.472, df = 187.14, p-value < 2.2e-16
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#>  -11.215668  -7.924332
#> sample estimates:
#> mean of x mean of y 
#>     37.50     47.07

Data

heur01 <- c(1411L, 1734L, 1048L, 2060L, 1983L, 1810L, 1387L, 1637L, 1419L, 1637L, 1185L, 1766L, 1484L, 1983L, 
    1217L, 1915L, 1846L, 1887L, 1742L, 988L, 1375L, 1193L, 2056L, 1919L, 1850L, 2076L, 1463L, 1113L, 1887L, 
    1919L, 1734L, 1157L, 1766L, 1951L, 1923L, 2173L, 1609L, 1895L, 1109L, 1028L, 1701L, 1875L, 1677L, 1653L, 
    1883L, 1677L, 1850L, 1738L, 1520L, 1415L, 1992L, 1919L, 1653L, 1625L, 1705L, 1742L, 1891L, 2108L, 1919L, 
    1911L, 1770L, 1834L, 1911L, 2060L, 1717L, 1943L, 1859L, 1738L, 1222L, 1709L, 2052L, 1141L, 1931L, 2068L, 
    2044L, 1725L, 1818L, 1798L, 1943L, 1939L, 1919L, 1790L, 2116L, 1750L, 2052L, 1605L, 1798L, 2169L, 1665L, 
    1673L, 1185L, 1717L, 1717L, 1657L, 1915L, 1778L, 2121L, 1786L, 1774L, 2056L, 1738L, 1883L, 1754L, 1790L, 
    1770L, 1947L, 1867L, 1794L, 1867L, 1790L, 1762L, 2080L, 1778L, 1903L, 1734L, 1838L, 1560L, 1592L, 1637L, 
    1467L, 1750L, 1653L, 1222L, 1709L, 1806L, 1334L, 1584L, 2052L, 1802L, 1774L, 1770L, 1258L, 1334L, 1322L, 
    1826L, 1600L, 2189L, 1907L, 1548L, 1617L, 1693L, 1020L, 992L, 1435L, 1613L, 1738L, 1419L, 1121L, 1629L, 
    1605L, 1455L, 1157L, 1717L, 1294L, 1359L, 1282L, 1758L, 1395L, 1129L, 1189L, 1790L, 1217L, 1133L, 1516L, 
    1516L, 1278L, 1072L, 911L, 1286L, 968L, 1076L, 1315L, 1221L, 1268L, 939L, 1879L, 986L, 1221L, 1456L, 
    1315L, 1785L, 1080L, 1362L, 1503L, 1127L, 1691L, 1174L, 1644L, 1691L, 939L, 1503L, 1080L, 1503L, 1832L, 
    1362L, 1691L, 1456L, 1879L, 1644L, 1033L)

Mean Confidence Interval of 400 random samples

Answers (1)

Related Questions