Reputation: 121
My data is binary with two linear independent variables. For both predictors, as they get bigger, there are more positive responses. I have plotted the data in a heatplot showing density of positive responses along the two variables. There are the most positive responses in the top right corner and negative responses in the bottom left, with a gradient change visible along both axes.
I would like to plot a line on the heatplot showing where a logistic regression model predicts that positive and negative responses are equally likely. (My model is of the form response~predictor1*predictor2+(1|participant)
.)
My question: How can I figure out the line based on this model at which the positive response rate is 0.5?
I tried using predict(), but that works the opposite way; I have to give it values for the factor rather than giving the response rate I want. I also tried using a function that I used before when I had only one predictor (function(x) (log(x/(1-x))-fixef(fit)[1])/fixef(fit)[2]
), but I can only get single values out of that, not a line, and I can only get values for one predictor at a time.
Upvotes: 2
Views: 520
Reputation: 60060
Using a simple example logistic regression model fitted to the mtcars
dataset, and the algebra described here, I can produce a heatmap with a decision boundary using:
library(ggplot2)
library(tidyverse)
data("mtcars")
m1 = glm(am ~ hp + wt, data = mtcars, family = binomial)
# Generate combinations of hp and wt across their observed range. Only
# generating 50 values of each here, which is not a lot but since each
# combination is included, you get 50 x 50 rows
pred_df = expand.grid(
hp = seq(min(mtcars$hp), max(mtcars$hp), length.out = 50),
wt = seq(min(mtcars$wt), max(mtcars$wt), length.out = 50)
)
pred_df$pred_p = predict(m1, pred_df, type = "response")
# For a given value of hp (predictor1), find the value of
# wt (predictor2) that will give predicted p = 0.5
find_boundary = function(hp_val, coefs) {
beta_0 = coefs['(Intercept)']
beta_1 = coefs['hp']
beta_2 = coefs['wt']
boundary_wt = (-beta_0 - beta_1 * hp_val) / beta_2
}
# Find the boundary value of wt for each of the 50 values of hp
# Using the algebra in the linked question you can instead find
# the slope and intercept of the boundary, so you could potentially
# skip this step
boundary_df = pred_df %>%
select(hp) %>%
distinct %>%
mutate(wt = find_boundary(hp, coef(m1)))
ggplot(pred_df, aes(x = hp, y = wt)) +
geom_tile(aes(fill = pred_p)) +
geom_line(data = boundary_df)
Producing:
Note that this only takes into account the fixed effects from the model, so if you want to somehow take into account random effects this could be more complex.
Upvotes: 2