Reputation: 65
I'm analyzing data from the American Housing Survey, which ship with replicate weights to compute correct standard errors, in R with survey
, but I want to make sure that I'm specifying the design correctly.
Here is how I do it:
svy <- svrepdesign(data = ahs,
weight = ~WEIGHT,
repweights = "REPWEIGHT[0-9]+",
type = "Fay",
rho = 0.5,
scale = 4/160,
rscales = rep(1, 160),
mse = TRUE)
I set rho
to 0.5
because, in in section 3.1 of the guide to use replicate weights published by the Census Bureau where they explain how to compute standard errors with SAS (https://www.census.gov/content/dam/Census/programs-surveys/ahs/tech-documentation/2015/Quick%20Guide%20to%20Estimating%20Variance%20Using%20Replicate%20Weights%202009%20to%20Current.pdf), they say to use the option VARMETHOD=BRR(FAY) without specifying any other options and, according to the SAS documentation (http://support.sas.com/documentation/onlinedoc/stat/142/surveymeans.pdf), the default value for this parameter is 0.5.
I set mse
to TRUE
because, in the formula they give for the standard error in section 4, the sum of squared deviations is calculated around the estimate of the statistic computed with the full sample weights.
Finally, I set scale
to 4/160
and rscales
to rep(1, 160)
because, in that same formula, the sum of squared deviations is multiplied by 4/160
but there is no multiplier inside the sum operator.
However, when I look at Anthony Joseph Damico's webpage on the American Housing Survey (http://asdfree.com/american-housing-survey-ahs.html), he does that:
ahs_design <-
svrepdesign(
weights = ~ wgt90geo ,
repweights = "repwgt[1-9]" ,
type = "Fay" ,
rho = ( 1 - 1 / sqrt( 4 ) ) ,
mse = TRUE ,
data = ahs_df
)
Forget about the names of the weight variables, which just changed in 2015 (presumably after he wrote that webpage), he's doing the same as me except that he doesn't specify the scale
and rscales
. Based on what I explain above and the documentation of survey
, it seems to me that he should specify them as I did, but I've never used replicate weights with survey
before, so I would like to make sure.
P. S. What I find even weirder is that, when I try not to specify scale
and rscales
, the standard errors I compute seem to be the same as when I do. This means that it probably doesn't matter in practice how I do it, but since the formula used to compute the standard errors is supposed to be different if I specify scale
and rscales
, I would still like to understand why it doesn't seem to affect the standard errors that are computed by survey
.
P. S. bis: Another thing I don't understand is that, even though the Census Bureau says it has used Fay's method and recommend to use a SAS procedure that will result in a Fay coefficient of 0.5
, there doesn't seem to be any Fay coefficient in the formula for the standard error given in the guide it published. This means that, if I were to write my own code to compute standard errors using that formula, the result would presumably be different than when I use survey
with a rho
of 0.5
or the SAS procedure recommended by the Census Bureau to compute standard errors, which doesn't make a lot of sense to me.
Upvotes: 1
Views: 369
Reputation: 186
svrepdesign
doesn't need scale
or rscales
arguments for Fay replicate weights, because it can work them out by itself. That's the point of having known type
s of weights. I should probably add a warning for when you specify them anyway.
There doesn't need to be a Fay coefficient in the formula explicitly. When the weights were constructed, the sampling weights were multiplied by 2-rho
or rho
to get replicate weights. That's all been done. Now all you need is to know how to scale the squared residuals. The Census Bureau formula (p6 of your link) has a multiplier of 4/160. That 4 is 1/(1-rho)^2
-- Anthony Damico's code has the reverse conversion, working out rho=0.5
from the 4
.
Straightforward BRR would have a multiplier of 1/160 rather than 4/160.
Upvotes: 3