llewmills
llewmills

Reputation: 3568

applying a glm to the same outcome variable using several different predictors in purrr, broom, and dplyr1.0.0

In this post I was shown how to run several glms on several outcomes using the same predictor using the functionality in purrr and broom packages. Now I would like to do the reverse, applying one glm each to several models with different predictors but using the same outcome (i.e. a series of univariate tests)

# data
set.seed(1234)
df <- data.frame(out = c(rbinom(50, 1, prob = 0.2),
                          rbinom(50, 1, prob = 0.8)),
                 pred1 = factor(rep(letters[1:2], each = 50)),
                 pred2 = factor(rep(letters[1:2], times = 50)))

This is what each glm would return if we ran them in isolation

summary(mod1 <- glm(out ~ pred1, data = df, family = binomial))  

# Coefficients:
#             Estimate Std. Error z value Pr(>|z|)    
# (Intercept)  -1.6582     0.3858  -4.299 1.72e-05 ***
# pred1b        3.4735     0.5612   6.190 6.03e-10 ***


summary(mod2 <- glm(out ~ pred2, data = df, family = binomial))

# Coefficients:
#             Estimate Std. Error z value Pr(>|z|)
# (Intercept)  0.08004    0.28307   0.283    0.777
# pred2b      -0.08004    0.40016  -0.200    0.841

But I would like to do this all at once and with results returned in a single object, using purr and broom. I tried the following syntax,

df %>% 
  select(starts_with("pred")) %>%
    map_df(~broom::tidy(glm(out ~ .,
                            data = df,
                            family = binomial)))

# output  
#   term        estimate std.error statistic  p.value
#   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
# 1 (Intercept)   -1.58      0.469    -3.38  7.36e- 4
# 2 pred1b         3.48      0.562     6.19  6.19e-10
# 3 pred2b        -0.157     0.561    -0.280 7.79e- 1
# 4 (Intercept)   -1.58      0.469    -3.38  7.36e- 4
# 5 pred1b         3.48      0.562     6.19  6.19e-10
# 6 pred2b        -0.157     0.561    -0.280 7.79e- 1

Now this output is the format I want (i.e. dataframe), but the results themselves are odd, with some coefficients (e.g. pred1) reported correctly but reported twice, some reported twice but incorrectly (e.g. intercept for model 1), and some ommitted (e.g. intercept and coefficient for model 2.

Any help much appreciated

Could someone

Upvotes: 1

Views: 378

Answers (1)

Anoushiravan R
Anoushiravan R

Reputation: 21918

If you want to stick to your approach I suggest you use map2 instead of map:

library(dplyr)
library(purrr)
library(broom)

df %>%
  select(starts_with("pred")) %>%
  map2_dfr(names(df)[-1], ~ tidy(glm(out ~ .x, data = df, family = "binomial")) %>%
            mutate(term = c("(Intercept)", .y)))

# A tibble: 4 x 5
  term        estimate std.error statistic  p.value
  <chr>          <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept)  -1.66       0.386    -4.30  1.72e- 5
2 pred1         3.47       0.561     6.19  6.03e-10
3 (Intercept)   0.0800     0.283     0.283 7.77e- 1
4 pred2        -0.0800     0.400    -0.200 8.41e- 1

This could be another approach a bit verbose but effective:

library(purrr)
library(broom)

fn <- function(n) {
  tidy(glm(df[["out"]] ~ df[[n]], data = df, family = "binomial")) %>%
    mutate(term = c("(Intercept)", names(df)[n]))
}

seq_len(ncol(df))[-c(1, 2)] %>%
  reduce(~ .x %>% 
           bind_rows(fn(.y)), .init = fn(seq_len(ncol(df))[2]))

# A tibble: 4 x 5
  term        estimate std.error statistic  p.value
  <chr>          <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept)  -1.66       0.386    -4.30  1.72e- 5
2 pred1         3.47       0.561     6.19  6.03e-10
3 (Intercept)   0.0800     0.283     0.283 7.77e- 1
4 pred2        -0.0800     0.400    -0.200 8.41e- 1

Upvotes: 3

Related Questions