R: automate table for results of several multivariable logistic regressions

Question

structure(list(Number = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 
12, 13, 14, 15), age = c(25, 26, 27, 28, 29, 30, 31, 32, 33, 
34, 35, 36, 37, 38, 39), sex = c(0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 
0, 1, 0, 1, 0), bmi = c(35, 32, 29, 26, 23, 20, 17, 35, 32, 29, 
26, 23, 20, 17, 21), Phenotype1 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 1, 1, 1), `Phenotype 2` = c(0, 1, 0, 1, 0, 1, 0, 1, 
0, 1, 0, 1, 1, 1, 1), `Phenotype 3` = c(1, 0, 1, 0, 1, 1, 1, 
1, 1, 1, 1, 0, 0, 0, 0), `Phenotype 4` = c(0, 0, 0, 0, 1, 1, 
0, 1, 0, 1, 1, 1, 1, 1, 1)), row.names = c(NA, -15L), class = c("tbl_df", 
"tbl", "data.frame"))

# A tibble: 15 x 8
   Number   age   sex   bmi Phenotype1 `Phenotype 2` `Phenotype 3` `Phenotype 4`
                                        
 1      1    25     0    35          0             0             1             0
 2      2    26     1    32          0             1             0             0
 3      3    27     0    29          0             0             1             0
 4      4    28     1    26          0             1             0             0
 5      5    29     0    23          0             0             1             1
 6      6    30     1    20          0             1             1             1
 7      7    31     0    17          0             0             1             0
 8      8    32     1    35          0             1             1             1
 9      9    33     0    32          0             0             1             0
10     10    34     1    29          0             1             1             1
11     11    35     0    26          0             0             1             1
12     12    36     1    23          0             1             0             1
13     13    37     0    20          1             1             0             1
14     14    38     1    17          1             1             0             1
15     15    39     0    21          1             1             0             1

Hi all, I have a dataset of 100 patients (15 are shown here), 3 covariates and 50 phenotypes(4 are shown here). I want to perform a multivariable logistic regression for each phenotype using age, sex and BMI as covariates, I would like to get a table like this, where I have the p-value, OR and confidence interval(CI)per each of the covariates.

I just don't know how to start. Thank you very much for your help!

Best, Caro

Alec B · Accepted Answer

I wrote a function that should accomplish what you need. There are likely more elegant and more R-like ways of doing this, but this approach worked in my testing:

## Load libraries
library(broom)
library(tidyr)
library(dplyr)


## Define a function to create your summary table
summary_table <- function(x) {
  
  # Capture number of columns passed to the function
  num_vars <- ncol(x)
  
  # Pre-define lists that will be populated and then collapsed by rest of function
  models <- vector("list", length = num_vars)
  first_tables <- vector("list", length = num_vars)
  second_tables <- vector("list", length = num_vars)
  
  # Loop to create each row for the final table
  for (i in 1:num_vars) {
      
    models[[i]] <- glm(x[[i]] ~ age + sex + bmi, family = "binomial", data = df)
      
    first_tables[[i]] <- broom::tidy(models[[i]])
    first_tables[[i]]$OR <- exp(first_tables[[i]]$estimate)
    first_tables[[i]]$CI1 <- exp(first_tables[[i]]$estimate - (1.96 * first_tables[[i]]$std.error))
    first_tables[[i]]$CI2 <- exp(first_tables[[i]]$estimate + (1.96 * first_tables[[i]]$std.error))
      
    first_tables[[i]] <- as.data.frame(first_tables[[i]][first_tables[[i]]$term != "(Intercept)", c("term", "p.value", "OR", "CI1", "CI2")])[1:3,]
      
      
    second_tables[[i]] <- first_tables[[i]] %>% 
                            pivot_wider(names_from = term, values_from = c("p.value", "OR", "CI1", "CI2")) %>%
                            select("p.value_age", "OR_age", "CI1_age", "CI2_age", "p.value_bmi", "OR_bmi", "CI1_bmi", "CI2_bmi",
                                   "p.value_sex", "OR_sex", "CI1_sex", "CI2_sex")
    
  } 
  
  # Combine the rows together into a final table
  final_table <- do.call("rbind", second_tables)
  final_table <- round(final_table, 3)
  row.names(final_table) <- rep(paste0("Phenotype", 1:num_vars))
  
  return(final_table)

}

## Let "df" be your data.frame with 100 rows and 54 columns

## Use the summary_table() function, passing in the 50 columns containing your Phenotype outcome vars (I assumed they're in columns 5:54)
final_table <- summary_table(df[5:54])

## Write the final table to your working directory as a CSV
write.csv(final_table, "final_table.csv")

R: automate table for results of several multivariable logistic regressions

Answers (1)

Related Questions