Determining Interpretation Direction of Logistic Model in R

Question

I am trying to run a logistic regression to predict a variable called has_sed (binary, describes whether a sample has sediment or not, coded as 0 = does not have sediment and 1 = has sediment). See the summary output of this model below:

Call:
glm(formula = has_sed ~ vw + ws_avg + s, family = binomial(link = "logit"), 
data = spdata_ss)

Deviance Residuals: 
Min       1Q   Median       3Q      Max  
-1.4665  -0.8659  -0.6325   1.1374   2.3407  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.851966   0.667291   1.277 0.201689    
vw          -0.118140   0.031092  -3.800 0.000145 ***
ws_avg      -0.015815   0.008276  -1.911 0.055994 .  
s            0.034471   0.019216   1.794 0.072827 .  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 296.33  on 241  degrees of freedom
Residual deviance: 269.91  on 238  degrees of freedom
AIC: 277.91

Number of Fisher Scoring iterations: 4

Now, I understand how to interpret a logistic model output like this in general, but I don't understand how R chooses the direction (may be a better word for that) of my dependent variable. How do I know if a unit increase in vw increases the log odds of a sample having sediment, or increases the log odds of that sample not having sediment (i.e., has_sed = 0 vs has_sed = 1)?

I plotted out each of these relationships with boxplots, and the sign on the estimates in the logistic model output looks reversed from what I'm seeing in the boxplots. So, does R calculate the log odds of has_sed being 0, or the log odds of it being 1?

missuse · Accepted Answer

This is best illustrated with an example, I will use iris data with two classes

data(iris)
iris2 = iris[iris$Species!="setosa",]
iris2$Species = factor(iris2$Species)
levels(iris2$Species)
#output[1] "versicolor" "virginica"

Lets make a glm

   model = glm(Species ~ Petal.Length, data = iris2, family = binomial(link = "logit"))
summary(model)
#truncated output
    Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)   -43.781     11.110  -3.941 8.12e-05 ***
Petal.Length    9.002      2.283   3.943 8.04e-05 ***


library(ggplot2)

  ggplot(iris2)+
  geom_boxplot(aes(x = Species, y = Petal.Length))

Chances of being "virginica" rise with increasing Petal.Length, the reference level was "versicolor" - the first level when we did levels(iris2$Species).

Lets change it

iris2$Species = relevel(iris2$Species, ref = "virginica")
levels(iris2$Species)
#output
[1] "virginica"  "versicolor"

model2 = glm(Species ~ Petal.Length, data = iris2, family = binomial(link = "logit"))
summary(model2)
#truncated output
Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)    43.781     11.110   3.941 8.12e-05 ***
Petal.Length   -9.002      2.283  -3.943 8.04e-05 ***

Now the reference level is "virginica" the first level in levels(iris2$Species). Chances of being "versicolor" drop with increasing Petal.Length.

In short the order of levels in your response variable determines the reference level for treatment contrasts.

Determining Interpretation Direction of Logistic Model in R

Answers (1)

Related Questions