Reputation: 1040
I have a survey for which I need to do two things;
I only want the final weighted mean for each variable after doing both these things.
I know how to find the survey weighted mean and the weighted average separately, but I do not know how to apply them together, or in which order to apply these weights. Here is an example below of my data, and how I could find the 'survey weighted mean' and the 'weighted average' separately.
Please see below for sample data:
library(survey)
dat_in <- read_table2("code CCS trad_sec Q1 enrolled wgt
23 TRUE sec 20 400 1.4
66 FALSE trad 40 20 3.0
34 TRUE sec 30 400 4.4
78 FALSE sec 40 25 2.2
84 TRUE trad 20 25 3.7
97 FALSE sec 10 500 4.1
110 TRUE sec 80 1000 4.5
123 FALSE trad 33 679 4.8
137 TRUE sec 34 764 5.2
150 FALSE sec 43 850 5.6
163 TRUE trad 45 935 6.0
177 FALSE trad 46 1020 6.4
190 TRUE trad 48 1105 6.7
203 FALSE trad 50 1190 7.1
217 TRUE trad 52 1276 7.5
230 FALSE trad 53 1361 7.9
243 TRUE trad 55 1446 8.3
256 FALSE trad 57 1531 8.6
270 TRUE sec 59 1616 9.0
283 FALSE sec 60 1701 9.4
296 TRUE sec 62 1787 9.8
310 FALSE sec 64 1872 10.2
")
SurveyDesign<- svydesign(id =~code,
weights = ~wgt,
data = dat_in)
# For CCS FALSE, sec
svymean(~Q1, subset(SurveyDesign,CCS=="FALSE" & trad_sec %in% c("sec")), na.rm = T)
# For CCS TRUE, sec
svymean(~Q1, subset(SurveyDesign,CCS=="TRUE" & trad_sec %in% c("sec")), na.rm = T)
*edited based on comment
dat_in %>% group_by(CCS, trad_sec) %>% mutate(wgtQ1 = weighted.mean(Q1, w = enrolled))
generate weighted average by group
dat_in2 <- dat_in %>%
group_by(CCS, trad_sec) %>%
mutate(wgtQ1 = weighted.mean(Q1, w = enrolled)) %>%
ungroup
Create survey design
SurveyDesign2<- svydesign(id =~code,
weights = ~wgt,
data = dat_in2)
**Run mean on aggreated weighted average
svymean(~wgtQ1, subset(SurveyDesign2,CCS=="FALSE" & trad_sec %in% c("sec")), na.rm = T)
My intuition is that I should apply the weighted average first and THEN apply the survey weights? This above solution seems funky because each row is the weighted average for each group (CCS,trad_sec), whereas the designs object should be fed dis-aggregated data?
All suggestions much appreciated!
Upvotes: 1
Views: 669
Reputation: 2765
I assume you care about standard error estimates (since otherwise you can just multiply the two sets of weights and use weighted.mean
). If so, it matters whether there is sampling uncertainty in the enrolled
variable as well as in Q1
, and whether the sampling weights should be applied to that variable too. If not, use svyby
to get the group means and svycontrast
to weight them
> means<-svyby(~Q1, ~CCS, svymean, design=subset(SurveyDesign, trad_sec %in% "sec"),
covmat=TRUE)
> means
CCS Q1 se
FALSE FALSE 50.36825 7.767602
TRUE TRUE 53.51020 6.453270
> with(subset(dat_in, trad_sec=="sec"), by(enrolled, list(CCS), sum))
: FALSE
[1] 4948
---------------------------------------------------------------------------------------
: TRUE
[1] 5967
> svycontrast(means, c(4948/(4948+5967),5967/(4948+4967)))
contrast SE
contrast 55.036 5.2423
If you want sampling weights applied to enrolled
as well, I think you want svyratio
to estimate a sampling-weighted version of
sum(enrolled*Q1)/sum(enrolled)
. You can do that one at a time:
> svyratio(~I(Q1*enrolled),~enrolled,
design=subset(SurveyDesign, trad_sec=="sec" & CCS==TRUE))
Ratio estimator: svyratio.survey.design2(~I(Q1 * enrolled), ~enrolled, design = subset(SurveyDesign,
trad_sec == "sec" & CCS == TRUE))
Ratios=
enrolled
I(Q1 * enrolled) 58.41278
SEs=
enrolled
I(Q1 * enrolled) 3.838715
> svyratio(~I(Q1*enrolled),~enrolled,
design=subset(SurveyDesign, trad_sec=="sec" & CCS==FALSE))
Ratio estimator: svyratio.survey.design2(~I(Q1 * enrolled), ~enrolled, design = subset(SurveyDesign,
trad_sec == "sec" & CCS == FALSE))
Ratios=
enrolled
I(Q1 * enrolled) 57.42204
SEs=
enrolled
I(Q1 * enrolled) 4.340065
or with svyby
> svyby(~I(Q1*enrolled),~CCS, svyratio, denom=~enrolled,
design=subset(SurveyDesign, trad_sec=="sec"))
CCS I(Q1 * enrolled)/enrolled se.I(Q1 * enrolled)/enrolled
FALSE FALSE 57.42204 4.340065
TRUE TRUE 58.41278 3.838715
(a note: it helps if you specify all the packages needed for your example code to run; in your case readr
for read_table2
)
Upvotes: 1