Reputation: 6277
When working with complex survey data in R, I often use the survey
package to create sampling weights or update them using a method such as raking or post-stratification. I know the weights are stored in a survey design object, but how do I extract those weights so I can inspect them or save them to a data file?
As an example, we'll load a survey dataset from the "svrep" R package and create a survey design object. We'll also create a bootstrap replicate design object as well.
data("lou_vax_survey", package = 'svrep')
library(survey)
# Create a survey design object ----
survey_design <- svydesign(data = lou_vax_survey,
weights = ~ SAMPLING_WEIGHT,
ids = ~ 1)
# Create a replicate survey design object ----
rep_survey_design <- as.svrepdesign(survey_design,
type = "boot",
replicates = 10)
Upvotes: 1
Views: 1041
Reputation: 6277
To extract the full-sample weights from a survey design object, you can use the function weights()
.
If you're working with a "regular" survey design object without replicate weights, you can simply use the following:
wts <- weights(survey_design)
head(wts)
# 1 2 3 4 5 6
# 596.702 596.702 596.702 596.702 596.702 596.702
If you're working with a replicate survey design object, you need to specify type = "sampling"
to get the full-sample weights.
wts <- weights(rep_survey_design, type = 'sampling')
head(wts)
# 1 2 3 4 5 6
# 596.702 596.702 596.702 596.702 596.702 596.702
Note that even though we write type = 'sampling'
, the weights that are extracted are not really the exact sampling weights. If you applied post-stratification or raking to your survey design object, for example, calling weights(..., type = 'sampling')
will return the post-stratified or raked weights.
For a replicate design object, you can specify weights(rep_survey_design, type = "analysis")
to get the matrix of replicate weights.
rep_wts <- weights(rep_survey_design, type = "analysis")
head(rep_wts)
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] 1193.404 1193.404 596.702 1193.404 0.000 596.702 0.000 0.000 1193.404 1193.404
# [2,] 596.702 596.702 596.702 0.000 0.000 0.000 596.702 0.000 596.702 596.702
# [3,] 1193.404 596.702 596.702 0.000 1193.404 0.000 1193.404 596.702 0.000 596.702
# [4,] 0.000 1193.404 596.702 1193.404 1193.404 1193.404 1790.106 1193.404 596.702 0.000
# [5,] 0.000 0.000 0.000 596.702 1790.106 596.702 0.000 596.702 0.000 0.000
# [6,] 0.000 1193.404 0.000 1193.404 0.000 596.702 0.000 596.702 0.000 0.000
Let's say you want to save your data to a CSV file so that you can share it with others or load it into Stata/SAS/SPSS. In this case, you'll want to have a data frame with columns for all of your variables as well as columns with the weights.
For this, you can use the function as_data_frame_with_weights()
from the svrep
package, which works for survey designs with or without replicate weights.
library(svrep)
df_with_weights <- rep_survey_design |>
as_data_frame_with_weights(full_wgt_name = "FULL_SAMPLE_WGT",
rep_wgt_prefix = "REP_WGT_")
str(df_with_weights)
# 'data.frame': 1000 obs. of 17 variables:
# $ RESPONSE_STATUS: chr "Nonrespondent" ...
# $ RACE_ETHNICITY : chr "White alone, not Hispanic or Latino" ...
# $ SEX : chr "Female" ...
# $ EDUC_ATTAINMENT: chr "Less than high school" ...
# $ VAX_STATUS : chr NA ...
# $ SAMPLING_WEIGHT: num 597 ...
# $ FULL_SAMPLE_WGT: num 597 ...
# $ REP_WGT_1 : num 1193 ...
# $ REP_WGT_2 : num 1193 ...
# $ REP_WGT_3 : num 597 ...
# $ REP_WGT_4 : num 1193 ...
# $ REP_WGT_5 : num 0 0 ...
# $ REP_WGT_6 : num 597 ...
# $ REP_WGT_7 : num 0 ...
# $ REP_WGT_8 : num 0 0 ...
# $ REP_WGT_9 : num 1193 ...
# $ REP_WGT_10 : num 1193 ...
Upvotes: 0