NewBee
NewBee

Reputation: 1040

Applying survey weights, and a weighted average concurrently

I have a survey for which I need to do two things;

  1. I need to apply survey weights to a set of variables using the survey package to retrieve the 'weighted' mean AND
  2. I need to find the weighted average of those variables.

I only want the final weighted mean for each variable after doing both these things.

I know how to find the survey weighted mean and the weighted average separately, but I do not know how to apply them together, or in which order to apply these weights. Here is an example below of my data, and how I could find the 'survey weighted mean' and the 'weighted average' separately.

Please see below for sample data:

library(survey) 

dat_in <- read_table2("code CCS trad_sec    Q1  enrolled    wgt
23  TRUE    sec 20  400 1.4
66  FALSE   trad    40  20  3.0
34  TRUE    sec 30  400 4.4
78  FALSE   sec 40  25  2.2
84  TRUE    trad    20  25  3.7
97  FALSE   sec 10  500 4.1
110 TRUE    sec 80  1000    4.5
123 FALSE   trad    33  679 4.8
137 TRUE    sec 34  764 5.2
150 FALSE   sec 43  850 5.6
163 TRUE    trad    45  935 6.0
177 FALSE   trad    46  1020    6.4
190 TRUE    trad    48  1105    6.7
203 FALSE   trad    50  1190    7.1
217 TRUE    trad    52  1276    7.5
230 FALSE   trad    53  1361    7.9
243 TRUE    trad    55  1446    8.3
256 FALSE   trad    57  1531    8.6
270 TRUE    sec 59  1616    9.0
283 FALSE   sec 60  1701    9.4
296 TRUE    sec 62  1787    9.8
310 FALSE   sec 64  1872    10.2
")

1.To apply survey weights:

Create survey design
SurveyDesign<- svydesign(id =~code,
                                 weights  = ~wgt,
                                 data = dat_in)
    
Find weighted mean and tabulations
# For CCS FALSE, sec 
svymean(~Q1, subset(SurveyDesign,CCS=="FALSE" & trad_sec %in% c("sec")), na.rm = T)

# For CCS TRUE, sec
svymean(~Q1, subset(SurveyDesign,CCS=="TRUE" & trad_sec %in% c("sec")), na.rm = T)

2. To find weighted average:

Weighted average based on enrollment

*edited based on comment

dat_in %>% group_by(CCS, trad_sec) %>% mutate(wgtQ1 = weighted.mean(Q1, w = enrolled))

Possible solution that combines 1 and 2? (based on crowd-source)

generate weighted average by group

dat_in2 <- dat_in %>%    
  group_by(CCS, trad_sec) %>%     
  mutate(wgtQ1 = weighted.mean(Q1, w = enrolled)) %>%     
  ungroup

Create survey design

SurveyDesign2<- svydesign(id =~code,                                   
                                     weights  = ~wgt,
                                     data = dat_in2)

**Run mean on aggreated weighted average

svymean(~wgtQ1, subset(SurveyDesign2,CCS=="FALSE" & trad_sec %in% c("sec")), na.rm = T)

My intuition is that I should apply the weighted average first and THEN apply the survey weights? This above solution seems funky because each row is the weighted average for each group (CCS,trad_sec), whereas the designs object should be fed dis-aggregated data?

All suggestions much appreciated!

Upvotes: 1

Views: 669

Answers (1)

Thomas Lumley
Thomas Lumley

Reputation: 2765

I assume you care about standard error estimates (since otherwise you can just multiply the two sets of weights and use weighted.mean). If so, it matters whether there is sampling uncertainty in the enrolled variable as well as in Q1, and whether the sampling weights should be applied to that variable too. If not, use svyby to get the group means and svycontrast to weight them

> means<-svyby(~Q1, ~CCS, svymean, design=subset(SurveyDesign, trad_sec %in% "sec"),
   covmat=TRUE)
> means
        CCS       Q1       se
FALSE FALSE 50.36825 7.767602
TRUE   TRUE 53.51020 6.453270
> with(subset(dat_in, trad_sec=="sec"), by(enrolled, list(CCS), sum))
: FALSE
[1] 4948
--------------------------------------------------------------------------------------- 
: TRUE
[1] 5967
> svycontrast(means, c(4948/(4948+5967),5967/(4948+4967)))
         contrast     SE
contrast   55.036 5.2423

If you want sampling weights applied to enrolled as well, I think you want svyratio to estimate a sampling-weighted version of sum(enrolled*Q1)/sum(enrolled). You can do that one at a time:

> svyratio(~I(Q1*enrolled),~enrolled, 
        design=subset(SurveyDesign, trad_sec=="sec" & CCS==TRUE))
Ratio estimator: svyratio.survey.design2(~I(Q1 * enrolled), ~enrolled, design = subset(SurveyDesign, 
    trad_sec == "sec" & CCS == TRUE))
Ratios=
                 enrolled
I(Q1 * enrolled) 58.41278
SEs=
                 enrolled
I(Q1 * enrolled) 3.838715
> svyratio(~I(Q1*enrolled),~enrolled, 
       design=subset(SurveyDesign, trad_sec=="sec" & CCS==FALSE))
Ratio estimator: svyratio.survey.design2(~I(Q1 * enrolled), ~enrolled, design = subset(SurveyDesign, 
    trad_sec == "sec" & CCS == FALSE))
Ratios=
                 enrolled
I(Q1 * enrolled) 57.42204
SEs=
                 enrolled
I(Q1 * enrolled) 4.340065

or with svyby

> svyby(~I(Q1*enrolled),~CCS, svyratio, denom=~enrolled,
 design=subset(SurveyDesign, trad_sec=="sec"))
        CCS I(Q1 * enrolled)/enrolled se.I(Q1 * enrolled)/enrolled
FALSE FALSE                  57.42204                     4.340065
TRUE   TRUE                  58.41278                     3.838715

(a note: it helps if you specify all the packages needed for your example code to run; in your case readr for read_table2)

Upvotes: 1

Related Questions