r_rabbit
r_rabbit

Reputation: 121

How to subset a survey object in R and use the tbl_svysummary with the add_difference function

I have a survey that I need to filter to include only two specific categories for comparison. A simulation of it can be reproduced with this code:

library(survey)
library(gtsummary)
data(api)

svy <- survey::svydesign(id = ~dnum, weights = ~pw, data = apiclus1, fpc = ~fpc) 

svy <- subset(svy, stype %in% c("E", "H"))

And I am trying to use this code to create a table with the differences:

svy %>%
  tbl_svysummary(by = stype, include = c(api00, stype),
                 statistic = list(all_categorical() ~ "{p}%",
                     all_continuous() ~ "{mean}"),
    digits = ~ 2,
    missing = "no") %>% 
    add_difference()

And getting this error: Error: 'tbl_summary'/'tbl_svysummary' object must have a by= value with exactly two levels.

How could I drop the level to remain just 2 and therefore use add_difference?

I use the filter in the dataframe, but I'm afraid it will cause me to lose the lifting weights.

Editing this post from the answer from @astra bellow, I've made a small example with my data, but it not works. Works only with the example above

install.packages("PNSIBGE")
library(PNSIBGE)
library(tidyverse)
pns <- get_pns(year = 2019, labels = T)
pns.2 <- subset(pns, C009  %in% c("Branca", "Preta")) 
pns.2$variables$C009 <- droplevels(pns.2$variables$C009)
pns.2 %>%
    tbl_svysummary(by = C009, include = c(C006)) %>% 
    add_difference()

Upvotes: 2

Views: 546

Answers (3)

Joseph Larmarange
Joseph Larmarange

Reputation: 116

First of all, the default test used by add_difference() is a "t.test" and such test is not available for a table produced by tbl_svysummary(). You should use "smd" method. See https://www.danieldsjoberg.com/gtsummary/reference/tests.html#tbl-svysummary-add-difference-

Second, to manipulate survey objects, it is often easier to use srvyr package which allows to use dplyr verbs.

library(survey)
#> Le chargement a nécessité le package : grid
#> Le chargement a nécessité le package : Matrix
#> Le chargement a nécessité le package : survival
#> 
#> Attachement du package : 'survey'
#> L'objet suivant est masqué depuis 'package:graphics':
#> 
#>     dotchart
library(srvyr)
#> 
#> Attachement du package : 'srvyr'
#> L'objet suivant est masqué depuis 'package:stats':
#> 
#>     filter
library(gtsummary)
library(forcats)

data(api)

svy <- apiclus1 |> 
  as_survey_design(id = dnum, weights = pw, fpc = fpc) |> 
  filter(stype %in% c("E", "H")) |> 
  mutate(stype = fct_drop(stype))

svy  |> 
  tbl_svysummary(
    by = stype,
    include = api00,
    statistic = all_continuous() ~ "{mean} ({sd})"
  ) |> 
  add_difference(test = all_continuous() ~ "smd") |> 
  as_kable()
Characteristic E, N = 4,874 H, N = 474 Difference 95% CI
api00 649 (106) 619 (97) 0.30 0.21, 0.40

Created on 2024-02-09 with reprex v2.0.2

Upvotes: 2

Thomas Lumley
Thomas Lumley

Reputation: 2765

This is your problem:

> str(pns.2$variables$C009)
 Factor w/ 6 levels "Branca","Preta",..: 1 4 2 2 4 2 1 1 4 4 ...
> table(pns.2$variables$C009)

  Branca    Preta  Amarela    Parda Indígena Ignorado 
   99019    28304     1698   148273     2064       24 
> svytable(~C009,pns.2)
C009
  Branca    Preta  Amarela    Parda Indígena Ignorado 
91037722 21786515        0        0        0        0 

The issue is that a survey design object isn't just a data frame. In particular, your one has post-stratification information in it. In that situation (and some others) the survey package removes observations by setting the weight to zero rather than by dropping rows (dropping rows is a computational shortcut that isn't always available).

So, you can't just use droplevels, because all the levels are (correctly) still there. If you're working with functions from the survey package you can use factor with an explicit set of levels

> svytotal(~factor(C009,levels=c("Branca","Preta")), pns.2,na.rm=TRUE)
                                                     total     SE
factor(C009, levels = c("Branca", "Preta"))Branca 91037722 686604
factor(C009, levels = c("Branca", "Preta"))Preta  21786515 338021

or equivalently

> pns.2<-update(pns.2, C009new=factor(C009,levels=c("Branca","Preta")))
> svytotal(~C009new, pns.2,na.rm=TRUE)
                 total     SE
C009newBranca 91037722 686604
C009newPreta  21786515 338021

but that doesn't seem to work with tbl_svysummary.

Upvotes: 2

Alex Strashny
Alex Strashny

Reputation: 96

Use the droplevels() function:

svy$variables$stype %<>% droplevels

Upvotes: 2

Related Questions