Reputation: 8150
I'm wondering how to use replication weights and the confint implementation of the survey package to construct bootstrap confidence intervals/standard errors.
Looking at the survey
package's implementation of confint, it seems as though it's simply taking the standard error of the theta
list generated after replication, and multiplying it by the statistic corresponding to a given alpha range.
But that doesn't really correspond with any bootstrap implementation I'm aware of. Typically, you'd instead be using a percentile of the distribution of theta to get the sample mean's distribution's confidence interval. The T and BCa intervals are another matter.
Here is my R Code. I have not used the provided weights, instead letting repweights
weights be generated as "sample with replacement" weights with equal probability.
data(api)
d <- apiclus1 %>% select(fpc, dnum,api99)
dclus1<-svydesign(id=~dnum, data=d, fpc=~fpc)
rclus1<-as.svrepdesign(dclus1,type="bootstrap", replicates=100)
To test the confidence intervals, we can use:
test_mean <- svymean(~api99, rclus1)
confint(test_mean, df=degf(rclus1))
confint(test_mean, df=degf(rclus1)) - mean(d$api99)
Which results:
2.5 % 97.5 %
api99 554.2971 659.6592
2.5 % 97.5 %
api99 -52.68107 52.68107
So clearly the interval is symmetric, which defeats some of the purpose of using the bootstrap.
So let's try this:
test_bs <- withReplicates(rclus1, function(w, data) weighted.mean(data$api99, w), return.replicates=T)
This will bootstrap the replicates, where the weights are the repweights (which I assume are with replacement weights). Here are the intervals using BCa intervals on the replications:
bca(test$replicates) - mean(d$api99)
-43.2878 49.1148
Clearly not symmetric.
Using percentile intervals:
c(quantile(test$replicates, 0.025),quantile(test$replicates, 0.975)) - mean(d$api99)
2.5% 97.5%
-45.50944 48.06085
Valliant implements percentile intervals this way, which should be equivalent to my percentile intervals:
smho.boot.a <- as.svrepdesign(design = smho.dsgn,
type = "subbootstrap",
replicates = 500)
# total & CI for EOYCNT based on RW bootstrap
a1 <- svytotal( ̃EOYCNT, design = smho.boot.a,
na.rm=TRUE,
return.replicates = TRUE)
# Compute CI based on bootstrap percentile method.
ta1 <- quantile(a1$replicates, c(0.025, 0.975))
I'm looking for clarifications on
a. How to construct bootstrap CIs with survey package for statistics of interest
b. If my withReplication
implementation of the bootstrap is correct
Upvotes: 3
Views: 634
Reputation: 2765
The percentile stuff doesn't actually work with multistage survey data (or, at least, it isn't known to work (or, at least, it isn't known to me to work)). Survey bootstraps just estimate the variance. You don't get the higher order accuracy for smooth functions that way, but you do get asymptotically the right variance. To get asymmetric intervals you need to bootstrap some suitable function of the parameter, as svyciprop
and svyquantile
(and in a sense svyglm
) do.
If you assume simple random sampling with replacement of clusters then you could make the percentile bootstrap and extensions work, but that's not a common structure for real-world surveys (and it doesn't really need the survey package)
I'm sure the maintainer of the package would be happy to implement a bootstrap that gave better asymmetric intervals and worked for multistage samples, if someone pointed out suitable references to him.
Upvotes: 7