Union find
Union find

Reputation: 8150

How does the R Survey package compute bootstrap confidence intervals?

I'm wondering how to use replication weights and the confint implementation of the survey package to construct bootstrap confidence intervals/standard errors.

Looking at the survey package's implementation of confint, it seems as though it's simply taking the standard error of the theta list generated after replication, and multiplying it by the statistic corresponding to a given alpha range.

But that doesn't really correspond with any bootstrap implementation I'm aware of. Typically, you'd instead be using a percentile of the distribution of theta to get the sample mean's distribution's confidence interval. The T and BCa intervals are another matter.

Here is my R Code. I have not used the provided weights, instead letting repweights weights be generated as "sample with replacement" weights with equal probability.

data(api)
d <- apiclus1 %>% select(fpc, dnum,api99)
dclus1<-svydesign(id=~dnum, data=d, fpc=~fpc)
rclus1<-as.svrepdesign(dclus1,type="bootstrap", replicates=100)

To test the confidence intervals, we can use:

test_mean <- svymean(~api99, rclus1)
confint(test_mean, df=degf(rclus1))
confint(test_mean, df=degf(rclus1)) - mean(d$api99)

Which results:

         2.5 %   97.5 %
api99 554.2971 659.6592
          2.5 %   97.5 %
api99 -52.68107 52.68107

So clearly the interval is symmetric, which defeats some of the purpose of using the bootstrap.

So let's try this:

test_bs <- withReplicates(rclus1, function(w, data) weighted.mean(data$api99, w), return.replicates=T)

This will bootstrap the replicates, where the weights are the repweights (which I assume are with replacement weights). Here are the intervals using BCa intervals on the replications:

bca(test$replicates) - mean(d$api99)
-43.2878  49.1148

Clearly not symmetric.

Using percentile intervals:

c(quantile(test$replicates, 0.025),quantile(test$replicates, 0.975))  - mean(d$api99)
     2.5%     97.5% 
-45.50944  48.06085 

Valliant implements percentile intervals this way, which should be equivalent to my percentile intervals:

 smho.boot.a <- as.svrepdesign(design = smho.dsgn,
                              type = "subbootstrap",
                              replicates = 500)
         # total & CI for EOYCNT based on RW bootstrap
 a1 <- svytotal( ̃EOYCNT, design = smho.boot.a,
                 na.rm=TRUE,
                 return.replicates = TRUE)
              # Compute CI based on bootstrap percentile method.
 ta1 <- quantile(a1$replicates, c(0.025, 0.975))

I'm looking for clarifications on

a. How to construct bootstrap CIs with survey package for statistics of interest b. If my withReplication implementation of the bootstrap is correct

Upvotes: 3

Views: 634

Answers (1)

Thomas Lumley
Thomas Lumley

Reputation: 2765

The percentile stuff doesn't actually work with multistage survey data (or, at least, it isn't known to work (or, at least, it isn't known to me to work)). Survey bootstraps just estimate the variance. You don't get the higher order accuracy for smooth functions that way, but you do get asymptotically the right variance. To get asymmetric intervals you need to bootstrap some suitable function of the parameter, as svyciprop and svyquantile (and in a sense svyglm) do.

If you assume simple random sampling with replacement of clusters then you could make the percentile bootstrap and extensions work, but that's not a common structure for real-world surveys (and it doesn't really need the survey package)

I'm sure the maintainer of the package would be happy to implement a bootstrap that gave better asymmetric intervals and worked for multistage samples, if someone pointed out suitable references to him.

Upvotes: 7

Related Questions