Reputation: 121
I have a survey that I need to filter to include only two specific categories for comparison. A simulation of it can be reproduced with this code:
library(survey)
library(gtsummary)
data(api)
svy <- survey::svydesign(id = ~dnum, weights = ~pw, data = apiclus1, fpc = ~fpc)
svy <- subset(svy, stype %in% c("E", "H"))
And I am trying to use this code to create a table with the differences:
svy %>%
tbl_svysummary(by = stype, include = c(api00, stype),
statistic = list(all_categorical() ~ "{p}%",
all_continuous() ~ "{mean}"),
digits = ~ 2,
missing = "no") %>%
add_difference()
And getting this error: Error: 'tbl_summary'/'tbl_svysummary' object must have a by=
value with exactly two levels.
How could I drop the level to remain just 2 and therefore use add_difference?
I use the filter in the dataframe, but I'm afraid it will cause me to lose the lifting weights.
Editing this post from the answer from @astra bellow, I've made a small example with my data, but it not works. Works only with the example above
install.packages("PNSIBGE")
library(PNSIBGE)
library(tidyverse)
pns <- get_pns(year = 2019, labels = T)
pns.2 <- subset(pns, C009 %in% c("Branca", "Preta"))
pns.2$variables$C009 <- droplevels(pns.2$variables$C009)
pns.2 %>%
tbl_svysummary(by = C009, include = c(C006)) %>%
add_difference()
Upvotes: 2
Views: 546
Reputation: 116
First of all, the default test used by add_difference()
is a "t.test" and such test is not available for a table produced by tbl_svysummary()
. You should use "smd" method. See https://www.danieldsjoberg.com/gtsummary/reference/tests.html#tbl-svysummary-add-difference-
Second, to manipulate survey objects, it is often easier to use srvyr package which allows to use dplyr verbs.
library(survey)
#> Le chargement a nécessité le package : grid
#> Le chargement a nécessité le package : Matrix
#> Le chargement a nécessité le package : survival
#>
#> Attachement du package : 'survey'
#> L'objet suivant est masqué depuis 'package:graphics':
#>
#> dotchart
library(srvyr)
#>
#> Attachement du package : 'srvyr'
#> L'objet suivant est masqué depuis 'package:stats':
#>
#> filter
library(gtsummary)
library(forcats)
data(api)
svy <- apiclus1 |>
as_survey_design(id = dnum, weights = pw, fpc = fpc) |>
filter(stype %in% c("E", "H")) |>
mutate(stype = fct_drop(stype))
svy |>
tbl_svysummary(
by = stype,
include = api00,
statistic = all_continuous() ~ "{mean} ({sd})"
) |>
add_difference(test = all_continuous() ~ "smd") |>
as_kable()
Characteristic | E, N = 4,874 | H, N = 474 | Difference | 95% CI |
---|---|---|---|---|
api00 | 649 (106) | 619 (97) | 0.30 | 0.21, 0.40 |
Created on 2024-02-09 with reprex v2.0.2
Upvotes: 2
Reputation: 2765
This is your problem:
> str(pns.2$variables$C009)
Factor w/ 6 levels "Branca","Preta",..: 1 4 2 2 4 2 1 1 4 4 ...
> table(pns.2$variables$C009)
Branca Preta Amarela Parda Indígena Ignorado
99019 28304 1698 148273 2064 24
> svytable(~C009,pns.2)
C009
Branca Preta Amarela Parda Indígena Ignorado
91037722 21786515 0 0 0 0
The issue is that a survey design object isn't just a data frame. In particular, your one has post-stratification information in it. In that situation (and some others) the survey
package removes observations by setting the weight to zero rather than by dropping rows (dropping rows is a computational shortcut that isn't always available).
So, you can't just use droplevels
, because all the levels are (correctly) still there. If you're working with functions from the survey
package you can use factor
with an explicit set of levels
> svytotal(~factor(C009,levels=c("Branca","Preta")), pns.2,na.rm=TRUE)
total SE
factor(C009, levels = c("Branca", "Preta"))Branca 91037722 686604
factor(C009, levels = c("Branca", "Preta"))Preta 21786515 338021
or equivalently
> pns.2<-update(pns.2, C009new=factor(C009,levels=c("Branca","Preta")))
> svytotal(~C009new, pns.2,na.rm=TRUE)
total SE
C009newBranca 91037722 686604
C009newPreta 21786515 338021
but that doesn't seem to work with tbl_svysummary
.
Upvotes: 2
Reputation: 96
Use the droplevels()
function:
svy$variables$stype %<>% droplevels
Upvotes: 2