allen.joseph
allen.joseph

Reputation: 47

Fitting a logistic model with multiple dependent vars/LHS

Is there a way to do multivariate logistic regression using glm()? I have several binary outcomes and I know you can do this with linear regression (lm()) and cbind() but I can't seem to figure out how to do it with glm() and cbind()

library(tidyverse)

lm(cbind(mpg, cyl, disp) ~ hp + drat + wt + qsec, data = mtcars) %>% 
  broom::tidy(conf.int = T)

This returns a tidy tibble with your outcomes (mpg, cyl, disp). How can I extend this to logistic regression.

df <- mtcars %>% 
  mutate(across(c(vs, am), factor))

glm(cbind(am, vs) ~ hp + drat + wt + qsec, family = binomial(link = "logit"), data = df) %>% 
  broom::tidy()

I know this isn't the best dataset to use (probably due to complete separation) but cbind just returns an unexpected output than if you ran the glms separately.

glm(am ~ hp + drat + wt + qsec, family = binomial(link = "logit"), data = df) %>% 
  broom::tidy()

glm(vs ~ hp + drat + wt + qsec, family = binomial(link = "logit"), data = df) %>% 
  broom::tidy()

Upvotes: 2

Views: 185

Answers (1)

GRowInG
GRowInG

Reputation: 656

Have a look at

glm(cbind(am, vs) ~ hp + drat + wt + qsec, family = binomial(link = "logit"), data = df, method = "model.frame") %>% 
broom::tidy()

which returns


# A tibble: 5 × 13
  column            n   mean     sd median trimmed    mad   min    max  range  skew kurtosis      se
  <chr>         <dbl>  <dbl>  <dbl>  <dbl>   <dbl>  <dbl> <dbl>  <dbl>  <dbl> <dbl>    <dbl>   <dbl>
1 cbind(am, vs)    64   1.42  0.498   1       1.40  0      1      2      1    0.316     1.10  0.0622
2 hp               32 147.   68.6   123     141.   52     52    335    283    0.761     3.05 12.1   
3 drat             32   3.60  0.535   3.70    3.58  0.475  2.76   4.93   2.17 0.279     2.44  0.0945
4 wt               32   3.22  0.978   3.32    3.15  0.517  1.51   5.42   3.91 0.444     3.17  0.173 
5 qsec             32  17.8   1.79   17.7    17.8   0.955 14.5   22.9    8.4  0.387     3.55  0.316

In the first line of results in the output above, you can see that glm seems to interpret cbind(am, vs) as a combined dependent variable of am and vs having multiple categories and not being dichotomous (in this example 0 or 1). This kind of dependent variable would require multinomial logistic regression as pointed out by Ben Bolker.

As you mentioned, if you run seperate glm models with one dependent variable at a time, you get different results and warning messages regarding complete separation as well as fitted probabilities having values (very close to) 0 or 1.

If you would like to run the two glm models in one go and present results in a single object, we can use purrr:map as shown here which results in the following approach:

#--------------
# Load packages
#--------------
library(tidyverse)

#--------------
# Define data, including independent variables & dependent variables
#--------------
df <- mtcars
independent.variables.formula <- "~ hp + drat + wt + qsec"
dependent.variables <- c("am", "vs")

#--------------
# create output for both models in a data.frame showing the results
#--------------
df_res <- map(dependent.variables, function(DV) {
  paste(DV, independent.variables.formula) %>% 
    as.formula %>% 
    glm(family=binomial(link = "logit"), data = df) %>%
    broom::tidy(conf.int = T)
}) %>%
  bind_rows() %>%
  as.data.frame () %>%
  mutate (DV = c(rep ("am", 5), rep ("vs", 5)),
          across(c(2:4, 6:7), .fns = function(x) {format(round(x, 2), nsmall = 2)})) %>%
  relocate (DV, .before = term)

df_res[ ,6] <- format.pval(df_res[ ,6], eps = .001, digits = 3) # format small p-values < 0.001 nicely

df_res

  DV        term estimate  std.error statistic p.value   conf.low conf.high
1  am (Intercept)   206.22 1166788.76      0.00   1.000         NA 238011.23
2  am          hp     0.24     687.34      0.00   1.000    -139.85        NA
3  am        drat   103.37  132242.73      0.00   0.999  -10359.33  12068.44
4  am          wt   -94.56   84508.03      0.00   0.999  -54340.19  14115.43
5  am        qsec   -20.03   34601.06      0.00   1.000   -3473.64   3149.50
6  vs (Intercept) -1928.91 1845493.29      0.00   0.999 -124797.75 120939.92
7  vs          hp     1.59    2408.34      0.00   0.999    -489.26        NA
8  vs        drat    92.35  139959.57      0.00   0.999  -16047.24  16231.94
9  vs          wt   -82.30   77899.01      0.00   0.999   -5580.25   4963.40
10 vs        qsec    91.59   75674.95      0.00   0.999   -3734.28   4115.71

Upvotes: 1

Related Questions