Reputation: 3568
In this post I was shown how to run several glms on several outcomes using the same predictor using the functionality in purrr
and broom
packages. Now I would like to do the reverse, applying one glm each to several models with different predictors but using the same outcome (i.e. a series of univariate tests)
# data
set.seed(1234)
df <- data.frame(out = c(rbinom(50, 1, prob = 0.2),
rbinom(50, 1, prob = 0.8)),
pred1 = factor(rep(letters[1:2], each = 50)),
pred2 = factor(rep(letters[1:2], times = 50)))
This is what each glm would return if we ran them in isolation
summary(mod1 <- glm(out ~ pred1, data = df, family = binomial))
# Coefficients:
# Estimate Std. Error z value Pr(>|z|)
# (Intercept) -1.6582 0.3858 -4.299 1.72e-05 ***
# pred1b 3.4735 0.5612 6.190 6.03e-10 ***
summary(mod2 <- glm(out ~ pred2, data = df, family = binomial))
# Coefficients:
# Estimate Std. Error z value Pr(>|z|)
# (Intercept) 0.08004 0.28307 0.283 0.777
# pred2b -0.08004 0.40016 -0.200 0.841
But I would like to do this all at once and with results returned in a single object, using purr
and broom
. I tried the following syntax,
df %>%
select(starts_with("pred")) %>%
map_df(~broom::tidy(glm(out ~ .,
data = df,
family = binomial)))
# output
# term estimate std.error statistic p.value
# <chr> <dbl> <dbl> <dbl> <dbl>
# 1 (Intercept) -1.58 0.469 -3.38 7.36e- 4
# 2 pred1b 3.48 0.562 6.19 6.19e-10
# 3 pred2b -0.157 0.561 -0.280 7.79e- 1
# 4 (Intercept) -1.58 0.469 -3.38 7.36e- 4
# 5 pred1b 3.48 0.562 6.19 6.19e-10
# 6 pred2b -0.157 0.561 -0.280 7.79e- 1
Now this output is the format I want (i.e. dataframe), but the results themselves are odd, with some coefficients (e.g. pred1) reported correctly but reported twice, some reported twice but incorrectly (e.g. intercept for model 1), and some ommitted (e.g. intercept and coefficient for model 2.
Any help much appreciated
Could someone
Upvotes: 1
Views: 378
Reputation: 21918
If you want to stick to your approach I suggest you use map2
instead of map
:
library(dplyr)
library(purrr)
library(broom)
df %>%
select(starts_with("pred")) %>%
map2_dfr(names(df)[-1], ~ tidy(glm(out ~ .x, data = df, family = "binomial")) %>%
mutate(term = c("(Intercept)", .y)))
# A tibble: 4 x 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) -1.66 0.386 -4.30 1.72e- 5
2 pred1 3.47 0.561 6.19 6.03e-10
3 (Intercept) 0.0800 0.283 0.283 7.77e- 1
4 pred2 -0.0800 0.400 -0.200 8.41e- 1
This could be another approach a bit verbose but effective:
library(purrr)
library(broom)
fn <- function(n) {
tidy(glm(df[["out"]] ~ df[[n]], data = df, family = "binomial")) %>%
mutate(term = c("(Intercept)", names(df)[n]))
}
seq_len(ncol(df))[-c(1, 2)] %>%
reduce(~ .x %>%
bind_rows(fn(.y)), .init = fn(seq_len(ncol(df))[2]))
# A tibble: 4 x 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) -1.66 0.386 -4.30 1.72e- 5
2 pred1 3.47 0.561 6.19 6.03e-10
3 (Intercept) 0.0800 0.283 0.283 7.77e- 1
4 pred2 -0.0800 0.400 -0.200 8.41e- 1
Upvotes: 3