Reputation: 11
I am interested in calculating the log-odds of the relationship between a continuous predictor and dichotomous outcome for purposes of graphically evaluating the linearity assumption for a logistic regression model. Does anyone know a formula for this? My key issue is I am unsure how to calculate an event rate for each level of the continuous predictor (i.e. number with outcome/total observations at that level).
Thank you!
Upvotes: 1
Views: 987
Reputation: 173793
Let's simulate some data to show how this can be done.
Imagine we are testing a new electrical product, and we test at a variety of temperatures to see whether temperature affects failure rate.
set.seed(69)
df <- data.frame(temperature = seq(0, 100, length.out = 1000),
failed = rbinom(1000, 1, seq(0.1, 0.9, length.out = 1000)))
So we have two columns: the temperature, and a dichotomous column of 1 (failed) and 0 (passed).
We can get a rough measure of the relationship between temperature and failure rate just by cutting our data frame into 5 degree bins:
df$temp_range <- cut(df$temperature, seq(0, 100, 5), include.lowest = TRUE)
We can now plot the proportion of devices that failed within each 5 degree temperature band:
library(ggplot2)
ggplot(df, aes(x = temp_range, y = failed)) + stat_summary()
#> No summary function supplied, defaulting to `mean_se()`
We can see that the probability of failure appears to go up linearly with temperature.
Now, if we get the proportions of failures in each bin, we take these as the estimate of probability of failure. This allows us to calculate the log odds of failure within each bin:
counts <- table(df$temp_range, df$failed)
probs <- counts[,2]/rowSums(counts)
logodds <- log(probs/(1 - probs))
temp_range <- seq(2.5, 97.5, 5)
logit_df <- data.frame(temp_range, probs, logodds)
So now, we can plot the log odds. Here, we will make our x axis continuous by taking the mid point of each bin as the x co-ordinate. We can then draw a linear regression through our points:
p <- ggplot(logit_df, aes(temp_range, logodds)) +
geom_point() +
geom_smooth(method = "lm", colour = "black", linetype = 2, se = FALSE)
p
#> `geom_smooth()` using formula 'y ~ x'
and in fact carry out a linear regression:
summary(lm(logodds ~ temp_range))
#>
#> Call:
#> lm(formula = logodds ~ temp_range)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -0.70596 -0.20764 -0.06761 0.18100 1.31147
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -2.160639 0.207276 -10.42 4.70e-09 ***
#> temp_range 0.046025 0.003591 12.82 1.74e-10 ***
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> Residual standard error: 0.463 on 18 degrees of freedom
#> Multiple R-squared: 0.9012, Adjusted R-squared: 0.8957
#> F-statistic: 164.2 on 1 and 18 DF, p-value: 1.738e-10
We can see that the linear assumption is reasonable here.
What we have just done is like a crude form of logistic regression. Let's now do it properly:
model <- glm(failed ~ temperature, data = df, family = binomial())
summary(model)
#>
#> Call:
#> glm(formula = failed ~ temperature, family = binomial(), data = df)
#>
#> Deviance Residuals:
#> Min 1Q Median 3Q Max
#> -2.1854 -0.8514 0.4672 0.8518 2.0430
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) -2.006197 0.159997 -12.54 <2e-16 ***
#> temperature 0.043064 0.002938 14.66 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 1383.4 on 999 degrees of freedom
#> Residual deviance: 1096.0 on 998 degrees of freedom
#> AIC: 1100
#>
#> Number of Fisher Scoring iterations: 3
Notice how close the coefficients are to our hand-crafted model.
Now that we have this model, we can plot its predictions over our crude linear estimate:
mod_df <- data.frame(temp_range = 1:100,
logodds = predict(model, newdata = list(temperature = 1:100)))
p + geom_line(data = mod_df, colour = "red", linetype = 3, size = 2)
#> `geom_smooth()` using formula 'y ~ x'
Pretty close.
Created on 2020-06-19 by the reprex package (v0.3.0)
Upvotes: 2