Reputation: 71
I'm working on an analysis with weighted survey data in R and I'm trying to calculate survey weighted prevalence ratios for several different covariates. Using the code below, I can use svytable to calculate weighted prevalence of condom_use
in the strata of each covariate (age
, rurality
, and sexual_orientation
) comparing two time periods:
library(survey)
set.seed(123)
#Example dataset
dat <- data.frame(
year = factor(sample(c("2019", "2020"), 100, replace = TRUE)),
sv_weight = sample(1:150, 100, replace = TRUE),
strat = sample(531011:532149, 100, replace = TRUE),
condom_use = sample(c(0, 1), 100, replace = TRUE),
age = factor(sample(c("18-29", "30-39", "40-49"), 100, replace = TRUE)),
rurality = factor(sample(c("Rural", "Urban"), 100, replace = TRUE)),
sexual_orientation = factor(sample(c("Straight", "Not straight"), 100, replace = TRUE)))
#Survey design options
options(survey.lonely.psu = "adjust")
design <- svydesign(data = dat,
id = ~1,
strata = ~strat,
weights = ~sv_weight)
#Creating prevalence tables for each covariate
table2 <- svytable(~year + condom_use + age, design) %>% prop.table(margin = 1)
table2
table3 <- svytable(~year + condom_use + rurality, design) %>% prop.table(margin = 1)
table3
table4 <- svytable(~year + condom_use + sexual_orientation, design) %>% prop.table(margin = 1)
table4
From these tables, I can calculate a prevalence ratio by dividing the prevalence of condom use in 2020 to the prevalence in 2019. I'm looking for the prevalence ratio within each strata, like for example comparing condom use in "rural" in 2020 compared to in "rural" in 2019.
My problem is that this method doesn't give me the option of calculating a confidence interval. Does anyone know of a method for getting a weighted prevalence ratio AND 95% confidence interval within each strata of a covariate?
Upvotes: 0
Views: 1063
Reputation: 2765
You could use a generalised linear model for a prevalence ratio.
Here, the coefficient marked year2020
is the log prevalence ratio in the 'Not straightgroup and the interaction coefficient is the log prevalence ratio in the
Straight` group
> summary(model<-svyglm(condom_use~sexual_orientation*year, design=design, family=quasibinomial(log)),df=Inf)
Call:
svyglm(formula = condom_use ~ sexual_orientation * year, design = design,
family = quasibinomial(log))
Survey design:
svydesign(data = dat, id = ~1, strata = ~strat, weights = ~sv_weight)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.7267 0.2506 -2.900 0.00373 **
sexual_orientationStraight 0.2618 0.2931 0.893 0.37175
year2020 0.1059 0.3469 0.305 0.76018
sexual_orientationStraight:year2020 -0.2215 0.4348 -0.509 0.61046
Now exponentiate the coefficients to get prevalence ratios
> exp(coef(model))
(Intercept) sexual_orientationStraight
0.4834875 1.2992183
year2020 sexual_orientationStraight:year2020
1.1116982 0.8013263
The 1.111696
matches what I get from your table4
for the ratio in Not straight
.
Now for confidence intervals use confint
> exp(confint(model,ddf=Inf))
2.5 % 97.5 %
(Intercept) 0.2958642 0.7900927
sexual_orientationStraight 0.7315182 2.3074863
year2020 0.5632601 2.1941424
sexual_orientationStraight:year2020 0.3417622 1.8788616
You can also subset the data and fit the model just to each stratum at a time.
Note: the ddf=Inf
is because your design has only 3 design degrees of freedom, which isn't enough to actually estimate a confidence interval. If your real data has more degrees of freedom you can omit it.
Upvotes: 0