notasfarwest
notasfarwest

Reputation: 71

R svytable - Weighted prevalence ratios AND confidence intervals within strata of a covariate?

I'm working on an analysis with weighted survey data in R and I'm trying to calculate survey weighted prevalence ratios for several different covariates. Using the code below, I can use svytable to calculate weighted prevalence of condom_use in the strata of each covariate (age, rurality, and sexual_orientation) comparing two time periods:

library(survey)
set.seed(123)

#Example dataset
dat <- data.frame(
  year = factor(sample(c("2019", "2020"), 100, replace = TRUE)),
  sv_weight = sample(1:150, 100, replace = TRUE),
  strat = sample(531011:532149, 100, replace = TRUE),
  condom_use = sample(c(0, 1), 100, replace = TRUE),
  age = factor(sample(c("18-29", "30-39", "40-49"), 100, replace = TRUE)),
  rurality = factor(sample(c("Rural", "Urban"), 100, replace = TRUE)),
  sexual_orientation =  factor(sample(c("Straight", "Not straight"), 100, replace = TRUE)))

#Survey design options
options(survey.lonely.psu = "adjust")
design <- svydesign(data = dat,
                    id = ~1,
                    strata = ~strat,
                    weights = ~sv_weight)

#Creating prevalence tables for each covariate
table2 <- svytable(~year + condom_use + age, design) %>% prop.table(margin = 1)
table2

table3 <- svytable(~year + condom_use + rurality, design) %>% prop.table(margin = 1)
table3

table4 <- svytable(~year + condom_use + sexual_orientation, design) %>% prop.table(margin = 1)
table4

From these tables, I can calculate a prevalence ratio by dividing the prevalence of condom use in 2020 to the prevalence in 2019. I'm looking for the prevalence ratio within each strata, like for example comparing condom use in "rural" in 2020 compared to in "rural" in 2019.

My problem is that this method doesn't give me the option of calculating a confidence interval. Does anyone know of a method for getting a weighted prevalence ratio AND 95% confidence interval within each strata of a covariate?

Upvotes: 0

Views: 1063

Answers (1)

Thomas Lumley
Thomas Lumley

Reputation: 2765

You could use a generalised linear model for a prevalence ratio.

Here, the coefficient marked year2020 is the log prevalence ratio in the 'Not straightgroup and the interaction coefficient is the log prevalence ratio in theStraight` group

> summary(model<-svyglm(condom_use~sexual_orientation*year, design=design, family=quasibinomial(log)),df=Inf)

Call:
svyglm(formula = condom_use ~ sexual_orientation * year, design = design, 
    family = quasibinomial(log))

Survey design:
svydesign(data = dat, id = ~1, strata = ~strat, weights = ~sv_weight)

Coefficients:
                                    Estimate Std. Error t value Pr(>|t|)   
(Intercept)                          -0.7267     0.2506  -2.900  0.00373 **
sexual_orientationStraight            0.2618     0.2931   0.893  0.37175   
year2020                              0.1059     0.3469   0.305  0.76018   
sexual_orientationStraight:year2020  -0.2215     0.4348  -0.509  0.61046   

Now exponentiate the coefficients to get prevalence ratios

> exp(coef(model))
                        (Intercept)          sexual_orientationStraight 
                          0.4834875                           1.2992183 
                           year2020 sexual_orientationStraight:year2020 
                          1.1116982                           0.8013263 

The 1.111696 matches what I get from your table4 for the ratio in Not straight.

Now for confidence intervals use confint

> exp(confint(model,ddf=Inf))
                                        2.5 %    97.5 %
(Intercept)                         0.2958642 0.7900927
sexual_orientationStraight          0.7315182 2.3074863
year2020                            0.5632601 2.1941424
sexual_orientationStraight:year2020 0.3417622 1.8788616

You can also subset the data and fit the model just to each stratum at a time.

Note: the ddf=Inf is because your design has only 3 design degrees of freedom, which isn't enough to actually estimate a confidence interval. If your real data has more degrees of freedom you can omit it.

Upvotes: 0

Related Questions