crytpodoc
crytpodoc

Reputation: 51

gtsummary in Survey analysis (labels get lost when using subset function of survey)

Labels of variables (using Labelled package) do not carry over when subsetting the survey using (subset of Survey package), and I end up having to manually insert labels into the gtsummary function.

library(dplyr)
library(survey)
library(gtsummary)
library(labelled)

#Reading the CSV file (please download a sample dataframe from link below) 
df <- read.csv("nis_2.csv")

#Change to factors 
names <- c("htn", "dm", "FEMALE")
df[, names] <- lapply(included_df[, names], factor)

#Changing labels 
var_label(df$AGE) <- "Age"
var_label(df$FEMALE) <- "Gender (Female)"
var_label(df$dm) <- "Diabetes"
var_label(df$htn) <- "Hypertension"

#declare survey design 
dstr <- svydesign(
 id = ~HOSP_NIS, 
 strata = ~NIS_STRATUM, 
 weights = ~DISCWT, 
 nest=TRUE, 
 survey.lonely.psu = "adjust",  
 data = df)

#subset the data to include our UGIB cirrhotics 
small_set <- subset(dstr, (htn == 1))
summary(small_set)

small_set %>%
  tbl_svysummary(
   by=dm,
   include = c(AGE, FEMALE),
   missing = "no", 
   statistic = all_continuous() ~ "{mean} ({sd})"
  ) %>%
  add_p() %>%
  add_overall() %>%
  modify_caption("**Table 1. Patient Characteristics**") %>%
  modify_spanning_header(c("stat_1", "stat_2") ~ "**History of Diabetes**")

Sample database at: https://github.com/Dr-Kaboum/nis_gt_summary/blob/16909872624714d1feb30bd501a6204aba947de7/nis_2.csv

Upvotes: 0

Views: 336

Answers (1)

Mike
Mike

Reputation: 4370

subset removes the labelled attributes, subset your data first, label it, and then pass to gtsummary

#example of the label being removed. 
library(labelled)


var_label(mtcars$mpg) <- "Mile per gallon"

mt2 <- subset(mtcars, cyl == 4)
var_label(mt2$mpg) <- "Mile per gallon" #need to relabel

edit using var_label() on subset of survey object

I can't access your data but using an example set, I show that you can relabel the data when you access the variables part of the survey object list. If you label there it will show up in the table.

# A dataset with a complex design
library(gtsummary)
data(api, package = "survey")
labelled::var_label(apiclus1$api99) <- "API 99"

survdat <-
  survey::svydesign(id = ~dnum, weights = ~pw, data = apiclus1, fpc = ~fpc) 
#this subset will remove labels
survdat2 <- subset(survdat, (cname == "Fresno"))
#relabel here after subset within survey object
labelled::var_label(survdat2$variables$api99) <- "API 99"

#make table with label
ex<-  tbl_svysummary(data = survdat2,by = "both", include = c(cname, api00, api99, both))

Upvotes: 2

Related Questions