gulraiz safdar
gulraiz safdar

Reputation: 7

Mean Confidence Interval of 400 random samples

Hi I have a question about R.

Actually I have a population of 200 employees and i know the mean and sd of for the whole population (working hours).

The following must be repeated 400 times:

1) Collect small random sample of 6 people in the population.

2) Construct a 90% level confidence interval for mean (μ) (assume that the population size is infinite)

3) Among the 400 confidence intervals constructed in 2), how many do not contain the value of mean (μ) of the whole population.

I collected sample and all but i am unable to build confidence intervals.

Here is what i have done so far:

> population<-data$hours01
> n<-6
> Vect <- rep(0,400)
> for(i in 1:400){
+ ech <- sample(population,n)
+ right[i]<-(mean(ech)) + 1.645*(((sd(ech))/sqrt(n)))
+ left[i]<-(mean(ech)) - 1.645*(((sd(ech))/sqrt(n)))

Here are the Data

heur01
    1411
    1734
    1048
    2060
    1983
    1810
    1387
    1637
    1419
    1637
    1185
    1766
    1484
    1983
    1217
    1915
    1846
    1887
    1742
    988
    1375
    1193
    2056
    1919
    1850
    2076
    1463
    1113
    1887
    1919
    1734
    1157
    1766
    1951
    1923
    2173
    1609
    1895
    1109
    1028
    1701
    1875
    1677
    1653
    1883
    1677
    1850
    1738
    1520
    1415
    1992
    1919
    1653
    1625
    1705
    1742
    1891
    2108
    1919
    1911
    1770
    1834
    1911
    2060
    1717
    1943
    1859
    1738
    1222
    1709
    2052
    1141
    1931
    2068
    2044
    1725
    1818
    1798
    1943
    1939
    1919
    1790
    2116
    1750
    2052
    1605
    1798
    2169
    1665
    1673
    1185
    1717
    1717
    1657
    1915
    1778
    2121
    1786
    1774
    2056
    1738
    1883
    1754
    1790
    1770
    1947
    1867
    1794
    1867
    1790
    1762
    2080
    1778
    1903
    1734
    1838
    1560
    1592
    1637
    1467
    1750
    1653
    1222
    1709
    1806
    1334
    1584
    2052
    1802
    1774
    1770
    1258
    1334
    1322
    1826
    1600
    2189
    1907
    1548
    1617
    1693
    1020
    992
    1435
    1613
    1738
    1419
    1121
    1629
    1605
    1455
    1157
    1717
    1294
    1359
    1282
    1758
    1395
    1129
    1189
    1790
    1217
    1133
    1516
    1516
    1278
    1072
    911
    1286
    968
    1076
    1315
    1221
    1268
    939
    1879
    986
    1221
    1456
    1315
    1785
    1080
    1362
    1503
    1127
    1691
    1174
    1644
    1691
    939
    1503
    1080
    1503
    1832
    1362
    1691
    1456
    1879
    1644
    1033

Upvotes: 0

Views: 260

Answers (1)

alistaire
alistaire

Reputation: 43354

You can build a function to calculate the confidence interval, and then apply it to samples with replicate to generate a matrix of confidence intervals, which you can check against the population mean.

There is a possible complication: when standard deviation is unknown, confidence intervals are calculated with the t distribution, but if it is, the cumulative normal is used. If the degrees of freedom is relatively large, it will make very little difference, but given that it will be only 5 for each sample, the difference matters here.

Thus, to build a robust function for the confidence interval, you would need something like

ci <- function(x, conf.level, sd = NULL){
    conf.level <- mean(c(conf.level, 1))
    mean.x <- mean(x)
    if (is.null(sd)) {    # when standard deviation unknown,
        sd <- sd(x)    # use sample standard deviation
        z <- qt(conf.level, length(x) - 1)    # and t distribution
    } else {
        z <- qnorm(conf.level)    # when known, use normal
    }
    int <- z * sd / sqrt(length(x))
    c(low = mean.x - int, 
      high = mean.x + int)
}

To try it out,

set.seed(47)    # make sampling reproducible

# make a matrix of confidence intervals
ints <- replicate(400, ci(sample(heur01, 6), .9, sd(heur01)))

ints[, 1:5]
#>          [,1]     [,2]     [,3]     [,4]     [,5]
#> low  1443.959 1441.625 1376.459 1486.625 1436.959
#> high 1865.041 1862.708 1797.541 1907.708 1858.041

# calculate number of intervals that don't contain mean
mean.x <- mean(heur01)
sum(mean.x < ints[1,] | mean.x > ints[2,])
#> [1] 37

To see that it is, in fact, different when standard deviation isn't specified,

set.seed(47)
with_sd <- replicate(100, {
    ints <- replicate(400, ci(sample(heur01, 6), .9, sd(heur01)))
    sum(mean.x < ints[1,] | mean.x > ints[2,])
})
summary(with_sd)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>    27.0    34.0    37.0    37.5    41.0    50.0

set.seed(47)
no_sd <- replicate(100, {
    ints <- replicate(400, ci(sample(heur01, 6), .9))
    sum(mean.x < ints[1,] | mean.x > ints[2,])
})
summary(no_sd)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>   29.00   43.00   46.00   47.07   52.00   66.00

t.test(with_sd, no_sd)
#> 
#>  Welch Two Sample t-test
#> 
#> data:  with_sd and no_sd
#> t = -11.472, df = 187.14, p-value < 2.2e-16
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#>  -11.215668  -7.924332
#> sample estimates:
#> mean of x mean of y 
#>     37.50     47.07

Data

heur01 <- c(1411L, 1734L, 1048L, 2060L, 1983L, 1810L, 1387L, 1637L, 1419L, 1637L, 1185L, 1766L, 1484L, 1983L, 
    1217L, 1915L, 1846L, 1887L, 1742L, 988L, 1375L, 1193L, 2056L, 1919L, 1850L, 2076L, 1463L, 1113L, 1887L, 
    1919L, 1734L, 1157L, 1766L, 1951L, 1923L, 2173L, 1609L, 1895L, 1109L, 1028L, 1701L, 1875L, 1677L, 1653L, 
    1883L, 1677L, 1850L, 1738L, 1520L, 1415L, 1992L, 1919L, 1653L, 1625L, 1705L, 1742L, 1891L, 2108L, 1919L, 
    1911L, 1770L, 1834L, 1911L, 2060L, 1717L, 1943L, 1859L, 1738L, 1222L, 1709L, 2052L, 1141L, 1931L, 2068L, 
    2044L, 1725L, 1818L, 1798L, 1943L, 1939L, 1919L, 1790L, 2116L, 1750L, 2052L, 1605L, 1798L, 2169L, 1665L, 
    1673L, 1185L, 1717L, 1717L, 1657L, 1915L, 1778L, 2121L, 1786L, 1774L, 2056L, 1738L, 1883L, 1754L, 1790L, 
    1770L, 1947L, 1867L, 1794L, 1867L, 1790L, 1762L, 2080L, 1778L, 1903L, 1734L, 1838L, 1560L, 1592L, 1637L, 
    1467L, 1750L, 1653L, 1222L, 1709L, 1806L, 1334L, 1584L, 2052L, 1802L, 1774L, 1770L, 1258L, 1334L, 1322L, 
    1826L, 1600L, 2189L, 1907L, 1548L, 1617L, 1693L, 1020L, 992L, 1435L, 1613L, 1738L, 1419L, 1121L, 1629L, 
    1605L, 1455L, 1157L, 1717L, 1294L, 1359L, 1282L, 1758L, 1395L, 1129L, 1189L, 1790L, 1217L, 1133L, 1516L, 
    1516L, 1278L, 1072L, 911L, 1286L, 968L, 1076L, 1315L, 1221L, 1268L, 939L, 1879L, 986L, 1221L, 1456L, 
    1315L, 1785L, 1080L, 1362L, 1503L, 1127L, 1691L, 1174L, 1644L, 1691L, 939L, 1503L, 1080L, 1503L, 1832L, 
    1362L, 1691L, 1456L, 1879L, 1644L, 1033L)

Upvotes: 1

Related Questions