Reputation: 35
I have a data frame with continuous and factor-type columns. I'm trying to build a summary table with gtsummary stratifying by a variable. My question is as follows:
Thank you!
FC.
Upvotes: 2
Views: 763
Reputation: 4370
I would try something like this. I am using the Iris dataset as an example. To answer your first question I would use sapply and use the shapiro.test to get if the data is normally distributed. I used the p-value to determine if it was normally distributed but you can substitute your own criteria if there is something more appropriate. After the first step you have two vectors one specifying which variables are normally dist and ones that are not. then you can pass that vector to gtsmmary to tell it to modify the test and statistics for those variables. you do not need to pass it for the non-normally distributed variables bc that is the default.
library(gtsummary)
library(dplyr)
normvals <- sapply(iris[sapply(iris, is.numeric)], function(x){
normtest <- shapiro.test(x)
#output pvalue
normtest$p.value
})
notnorm <- names(normvals[normvals <.05])
norm <- names(normvals[normvals >= .05])
irisdf <- filter(iris, Species != "setosa") %>%
mutate(Species = as.character(Species))
tbl_summary(irisdf,
by = Species,
statistic = list(all_of(norm) ~ "{mean} ({sd})")) %>%
add_p(
test = list(all_of(norm) ~ "t.test"
))
Edit: you can hard code the variables into the gtsummary call so you can make sure it works on the version that is on CRAN as of 9/22/2020:
tbl_summary(irisdf,
by = Species,
statistic = list(c('Sepal.Width', 'Sepal.Length') ~ "{mean} ({sd})")) %>%
add_p(
test = list(c('Sepal.Width', 'Sepal.Length') ~ "t.test"
))
Upvotes: 3