bschneidr
bschneidr

Reputation: 6277

How do I get the weights from a survey design object in R?

Question

When working with complex survey data in R, I often use the survey package to create sampling weights or update them using a method such as raking or post-stratification. I know the weights are stored in a survey design object, but how do I extract those weights so I can inspect them or save them to a data file?

Example Data

As an example, we'll load a survey dataset from the "svrep" R package and create a survey design object. We'll also create a bootstrap replicate design object as well.

data("lou_vax_survey", package = 'svrep')

library(survey)

# Create a survey design object ----
survey_design <- svydesign(data = lou_vax_survey,
                           weights = ~ SAMPLING_WEIGHT,
                           ids = ~ 1)

# Create a replicate survey design object ----
rep_survey_design <- as.svrepdesign(survey_design,
                                    type = "boot",
                                    replicates = 10)

Upvotes: 1

Views: 1041

Answers (1)

bschneidr
bschneidr

Reputation: 6277

Extracting full-sample weights as a vector

To extract the full-sample weights from a survey design object, you can use the function weights().

If you're working with a "regular" survey design object without replicate weights, you can simply use the following:

wts <- weights(survey_design)

head(wts)
#       1       2       3       4       5       6 
# 596.702 596.702 596.702 596.702 596.702 596.702 

If you're working with a replicate survey design object, you need to specify type = "sampling" to get the full-sample weights.

wts <- weights(rep_survey_design, type = 'sampling')

head(wts)
#       1       2       3       4       5       6 
# 596.702 596.702 596.702 596.702 596.702 596.702 

Note that even though we write type = 'sampling', the weights that are extracted are not really the exact sampling weights. If you applied post-stratification or raking to your survey design object, for example, calling weights(..., type = 'sampling') will return the post-stratified or raked weights.

Extracting the matrix of replicate weights

For a replicate design object, you can specify weights(rep_survey_design, type = "analysis") to get the matrix of replicate weights.

rep_wts <- weights(rep_survey_design, type = "analysis")

head(rep_wts)
#          [,1]     [,2]    [,3]     [,4]     [,5]     [,6]     [,7]     [,8]     [,9]    [,10]
# [1,] 1193.404 1193.404 596.702 1193.404    0.000  596.702    0.000    0.000 1193.404 1193.404
# [2,]  596.702  596.702 596.702    0.000    0.000    0.000  596.702    0.000  596.702  596.702
# [3,] 1193.404  596.702 596.702    0.000 1193.404    0.000 1193.404  596.702    0.000  596.702
# [4,]    0.000 1193.404 596.702 1193.404 1193.404 1193.404 1790.106 1193.404  596.702    0.000
# [5,]    0.000    0.000   0.000  596.702 1790.106  596.702    0.000  596.702    0.000    0.000
# [6,]    0.000 1193.404   0.000 1193.404    0.000  596.702    0.000  596.702    0.000    0.000

Saving a dataframe with columns of weights

Let's say you want to save your data to a CSV file so that you can share it with others or load it into Stata/SAS/SPSS. In this case, you'll want to have a data frame with columns for all of your variables as well as columns with the weights.

For this, you can use the function as_data_frame_with_weights() from the svrep package, which works for survey designs with or without replicate weights.

library(svrep)

df_with_weights <- rep_survey_design |> 
  as_data_frame_with_weights(full_wgt_name = "FULL_SAMPLE_WGT",
                             rep_wgt_prefix = "REP_WGT_")

str(df_with_weights)
# 'data.frame': 1000 obs. of  17 variables:
#   $ RESPONSE_STATUS: chr  "Nonrespondent" ...
# $ RACE_ETHNICITY : chr  "White alone, not Hispanic or Latino" ...
# $ SEX            : chr  "Female" ...
# $ EDUC_ATTAINMENT: chr  "Less than high school" ...
# $ VAX_STATUS     : chr  NA ...
# $ SAMPLING_WEIGHT: num  597 ...
# $ FULL_SAMPLE_WGT: num  597 ...
# $ REP_WGT_1      : num  1193 ...
# $ REP_WGT_2      : num  1193 ...
# $ REP_WGT_3      : num  597 ...
# $ REP_WGT_4      : num  1193 ...
# $ REP_WGT_5      : num  0 0 ...
# $ REP_WGT_6      : num  597 ...
# $ REP_WGT_7      : num  0 ...
# $ REP_WGT_8      : num  0 0 ...
# $ REP_WGT_9      : num  1193 ...
# $ REP_WGT_10     : num  1193 ...

Upvotes: 0

Related Questions