Carolin V
Carolin V

Reputation: 51

Loop glm for every column in R dataset

I have a dataset of 100 patients (7 are shown here), 2 covariates and 50 phenotypes(5 are shown here). I want to perform a multivariable logistic regression for each phenotype using Covariate1 and Covariate2 as covariates to predict the Outcome, I would like to get a table like this, where I have the p-value, OR and confidence interval(CI)per each of the covariates. enter image description here

I tried:

for (i in df) {
  print(i)
  model <-glm(Outcome~ x[i] +Covariate1 +Covariate2, family = binomial(link = "logit"), data=df) 

I also tried the solution for this question. But x and y a reversed in my question, so it did not work: R: automate table for results of several multivariable logistic regressions

Thanks very much for your help!

This is an example dataset

df<-structure(list(ID = c(1, 2, 3, 4, 5, 6, 7), Outcome = c(0, 0, 
1, 1, 0, 1, 0), Covariate1 = c(1, 2, 3, 4, 5, 6, 7), Covariate2 = c(0, 
0, 0, 1, 1, 1, 1), P1 = c(1, 0, 0, 1, 1, 1, 2), P2 = c(0, 2, 
0, 1, 1, 1, 1), P3 = c(0, 0, 0, 1, 1, 1, 1), P4 = c(0, 0, 0, 
1, 2, 1, 1), P5 = c(0, 0, 0, 1, 1, 1, 2)), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -7L))

Upvotes: 0

Views: 39

Answers (1)

Yuriy Saraykin
Yuriy Saraykin

Reputation: 8880

if I understood correctly

df <- structure(
  list(
    ID = c(1, 2, 3, 4, 5, 6, 7),
    Outcome = c(0, 0, 1, 1, 0, 1, 0),
    Covariate1 = c(1, 2, 3, 4, 5, 6, 7),
    Covariate2 = c(0, 0, 0, 1, 1, 1, 1),
    P1 = c(1, 0, 0, 1, 1, 1, 2),
    P2 = c(0, 2, 0, 1, 1, 1, 1),
    P3 = c(0, 0, 0, 1, 1, 1, 1),
    P4 = c(0, 0, 0, 1, 2, 1, 1),
    P5 = c(0, 0, 0, 1, 1, 1, 2)
  ),
  class = c("tbl_df",
            "tbl", "data.frame"),
  row.names = c(NA,-7L)
)

library(tidyverse)

first_tables <- map(
  .x = select(df, starts_with("P")),
  .f = ~ glm(
    Outcome ~ .x + Covariate1 + Covariate2,
    family = binomial(link = "logit"),
    data = df
  )
) %>%
  map(broom::tidy)

map_df(
  .x = first_tables,
  .f = ~ .x %>% mutate(
    p = p.value,
    OR  = exp(estimate),
    CI5 = exp(estimate - 1.96 * std.error),
    CI95 = exp(estimate + 1.96 * std.error),
    .keep = "unused"
  ) %>%
    select(-statistic),
  .id = "phenotype"
) %>%
  filter(term == ".x") %>%
  select(-term)
#> # A tibble: 5 x 5
#>   phenotype     p       OR     CI5  CI95
#>   <chr>     <dbl>    <dbl>   <dbl> <dbl>
#> 1 P1        0.997 5.84e-10 0        Inf 
#> 2 P2        0.996 1.53e- 4 0        Inf 
#> 3 P3        0.824 2.00e+ 0 0.00442  904.
#> 4 P4        0.998 3.66e- 9 0        Inf 
#> 5 P5        0.997 2.72e-10 0        Inf

Created on 2023-01-11 with reprex v2.0.2

Upvotes: 2

Related Questions