Reputation: 423
I would like to use tbl_uvregression function (gtsummary package, R) because it can create univariate regression models holding either a covariate or outcome constant.
In my case, For each outcome, I need one nicely formatted table of univariate regression results containing every variable in the dataframe, except the outcome variable.This works fine if I subset my dataframe to contain only one outcome and the covariates of interest, before passing it to tbl_uvregression function.
However, I need help to figure out how to automate this process as I have many outcome variables and for each outcome variable, I want to produce one table of univariate regression using the same set of covariates - but not include the other outcome variables - and also label the tables so as to keep track of which table belongs to which outcome variable.
How do I do this?
# Libraries
library(gtsummary)
library(tidyverse)
# Data as well as a few artificial variables
data("iris")
my_iris <- as.data.frame(iris)
my_iris$out1 <- sample(c(0,1), 150, replace = TRUE)
my_iris$out2 <- sample(c(0,1), 150, replace = TRUE)
my_iris$out3 <- sample(c(0,1), 150, replace = TRUE)
# Extra variables below to simulate that the dataframe has extra covariates,
# hence need to select those of interest.
my_iris$x1 <- sample(c(1:12), 150, replace = TRUE)
my_iris$x2 <- sample(c(50:100), 150, replace = TRUE)
my_iris$x3 <- sample(c(18:100), 150, replace = TRUE)
# List of outcome(*outcome*) and predictor(*preds*) variables I need to run univariate logistic regressions for.
outcome <- c("out1", "out2", "out3") # have a long list, but this is sufficient for demo
preds <- c("Species", "Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width") # same here
# To produce a nicely formatted table for a single outcome I can do:
my_iris %>%
dplyr::select(outcome[1], all_of(preds)) %>%
tbl_uvregression(method = glm,
y = outcome[1],
method.args = list(family = binomial),
exponentiate = TRUE) %>%
bold_labels() %>% modify_caption(paste("Univariate Regression Model with", outcome[1], "as Outcome", sep = " "))
# How to automate production of above table for multiple outcomes?
Upvotes: 2
Views: 1328
Reputation: 41
I´d suggest an almost identical method (for loop) without using function and returning a list that stores the tbl_uvregression objects. Both methods have pretty similar performance.
library(gtsummary)
library(microbenchmark)
### Alternative method using a for loop and a list to store tables ####
mod.list<-list()
loop.list<-list()
for (i in seq_along(outcome)){
loop.list[[i]]<-microbenchmark::microbenchmark( loop=
mod.list[[i]]<-my_iris %>%
tbl_uvregression(method = glm,
y = substitute(i,list(i=as.name(outcome[i]))),
include=all_of(preds),
method.args = list(family = binomial),
exponentiate = TRUE) %>%
bold_labels() %>%
modify_caption(paste("Univariate Regression Model with", outcome[i], "as Outcome", sep = " ")),
times=5,unit='s')
}
loop.list
[[1]]
Unit: seconds
expr min lq mean median uq max neval
loop 5.323629 5.40431 6.014162 5.519318 5.642695 8.180858 5
[[2]]
Unit: seconds
expr min lq mean median uq max neval
loop 5.717601 5.848664 6.077736 6.056062 6.265348 6.501003 5
[[3]]
Unit: seconds
expr min lq mean median uq max neval
loop 6.034187 6.038607 6.180724 6.04633 6.257016 6.52748 5
Note: Of course, the script to store microbenchmark results can be deleted.
### Kevin´s method: using lapply ####
lapply(outcome, function(x){
microbenchmark::microbenchmark(loop=
my_iris %>%
dplyr::select(!!x, all_of(preds)) %>%
tbl_uvregression(method = glm,
y = !!x,
method.args = list(family = binomial),
exponentiate = TRUE) %>%
bold_labels() %>% modify_caption(paste("Univariate Regression Model with", x, "as Outcome", sep = " ")),
times=5,unit='s')
}
)
[[1]]
Unit: seconds
expr min lq mean median uq max neval
loop 5.389512 6.727198 7.112398 7.076198 7.845806 8.523274 5
[[2]]
Unit: seconds
expr min lq mean median uq max neval
loop 6.43759 6.444641 6.511864 6.467782 6.523884 6.685421 5
[[3]]
Unit: seconds
expr min lq mean median uq max neval
loop 6.469109 6.562612 6.907325 6.764478 6.856771 7.883655 5
Warning: Having the tbl_uvregression tables in a list to customize later is awesome, but there is a downside. tbl_uvregression objects are quite big (even after using tbl_butcher), so if you have many predictors (i.e 52) and many outcomes (i.e 21) the process could take 40 minutes (around two minutes per model, actually not too much considering the almost ready to publish tables) and the lists are too big to keep them in the workspace image. To be honest, I am using survey::svyglm, but I don´t think this is the reason for taking such a long time.
So, if you run the code from the top, as usually suggested (only source code is real, thanks to Julia Silge answering to Problems saving workspace in R), it takes too much time to render the rmd file (at least in a laptop with 4core i7 1.8GHz processor), but if you want to store the list, it is impossible because it is 1.2GB size (at least in a laptop with 16GB of RAM).
I haven´t found a way to speed up the process, but in my opinion gtsummary::tbl_uvregression is still worthwhile for this purpose.
Upvotes: 2
Reputation: 4370
I would use lapply to loop through the outcomes like this:
library(gtsummary)
library(tidyverse)
# Data as well as a few artificial variables
data("iris")
my_iris <- as.data.frame(iris)
my_iris$out1 <- sample(c(0,1), 150, replace = TRUE)
my_iris$out2 <- sample(c(0,1), 150, replace = TRUE)
my_iris$out3 <- sample(c(0,1), 150, replace = TRUE)
# Extra variables below to simulate that the dataframe has extra covariates,
# hence need to select those of interest.
my_iris$x1 <- sample(c(1:12), 150, replace = TRUE)
my_iris$x2 <- sample(c(50:100), 150, replace = TRUE)
my_iris$x3 <- sample(c(18:100), 150, replace = TRUE)
# List of outcome(*outcome*) and predictor(*preds*) variables I need to run univariate logistic regressions for.
outcome <- c("out1", "out2", "out3") # have a long list, but this is sufficient for demo
preds <- c("Species", "Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width") # same here
# To produce a nicely formatted table for a single outcome I can do:
lapply(outcome, function(x){
my_iris %>%
dplyr::select(!!x, all_of(preds)) %>%
tbl_uvregression(method = glm,
y = !!x,
method.args = list(family = binomial),
exponentiate = TRUE) %>%
bold_labels() %>% modify_caption(paste("Univariate Regression Model with", x, "as Outcome", sep = " "))
})
Upvotes: 3