Reputation: 153
I am trying to plot the result of a logistic regression in the log odd scale.
load(url("https://github.com/bossaround/question/raw/master/logisticregressdata.RData"))
ggplot(D, aes(Year, as.numeric(Vote), color = as.factor(Male))) +
stat_smooth( method="glm", method.args=list(family="binomial"), formula = y~x + I(x^2), alpha=0.5, size = 1, aes(fill=as.factor(Male))) +
xlab("Year")
but this plot is on a 0~1 scale. I guess this is the probability scale (correct me if I am wrong)?
What I really want is to plot it on a log odd scale just as the logistic regression reports before converting it to probability.
Ideally, I want to plot the relationship between Vote and Year, by Male, after controlling for Foreign, in a model like this:
Model <- glm(Vote ~ Year + I(Year^2) + Male + Foreign, family="binomial", data=D)
I could manually draw the line based on summary(Model)
, but I also want to plot the confidence interval.
Something like the image on page 44 of this document I found online: http://www.datavis.ca/papers/CARME2015-2x2.pdf. Mine would have a quadratic curve.
Thank you!
Upvotes: 1
Views: 3908
Reputation: 1
Your approach is correct but you need to predict the values with the model you have built something like this:
ModelPredictions <- predict(Model , type="response")
After that, you can plot using ggplot:
ggplot(D, aes(x=ModelPredictions , y=D$Vote )) +
geom_point() + stat_smooth(method="glm", se=FALSE, method.args = list(family=binomial)) + facet_wrap( ~ Foreign)
Upvotes: 0
Reputation: 19716
To plot the predictions of a model with several variables, one should make the model, predict on new data to generate predictions and plot that
Model <- glm(Vote ~ Year + I(Year^2) + Male + Foreign, family="binomial", data=D)
for_pred = expand.grid(Year = seq(from = 2, to = 10, by = 0.1), Male = c(0,1), Foreign = c(0,1)) #generate data to get a smooth line
for_pred = cbind(for_pred, predict(Model, for_pred, type = "link", se.fit= T))
#if the probability scale was needed: `type = "response`
library(ggplot2)
ggplot(for_pred, aes(Year, fit, color = as.factor(Male))) +
geom_line() +
xlab("Year")+
facet_wrap(~Foreign) + #important step - check also how it looks without it
geom_ribbon(aes(ymax = fit + se.fit, ymin = fit - se.fit, fill = as.factor(Male)), alpha = 0.2)
#omit the color by `color = NA` or by `inherit.aes = F` (if like this, one should provide the data and full `aes` mapping for geom_ribbon).
#If geom_ribbon should not have a mapping, specify `fill` outside of `aes` like: `fill = grey80`.
Check out library sjPlot
Upvotes: 3
Reputation: 903
A further answer using augmented()
from broom()
:
Model <- glm(Vote ~ Year + I(Year^2) + Male + Foreign, family="binomial", data=D)
summary(Model)
# augmented data frame
model.df = augment(Model) %>% rename(log_odds = `.fitted`,
Sex = Male)
glimpse(model.df.1)
Observations: 46,398
Variables: 13
$ .rownames <chr> "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20", "21"...
$ Vote <dbl> 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, ...
$ Year <int> 2, 3, 4, 5, 2, 3, 2, 3, 4, 5, 6, 2, 2, 2, 3, 4, 2, 3, 4, 5, 6, 7, 8, 9, 2, 3, 4, 5, 2, 3, 2, 3, 4, 5, 2, 3, 4, 5, ...
$ I.Year.2. <S3: AsIs> 4, 9, 16, 25, 4, 9, 4, 9, 16, 25, 36, 4, 4, 4, 9, 16, 4, 9, 16, 25, 36, 49, 64, 81, 4, 9, 16, 2...
$ Sex <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
$ Foreign <int> 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
$ log_odds <dbl> -0.01910985, -0.68184753, -1.14317053, -1.40307885, 0.26930939, -0.39342829, -0.01910985, -0.68184753, -1.14317053...
$ .se.fit <dbl> 0.01675017, 0.01466136, 0.01790972, 0.02058514, 0.01931826, 0.01777691, 0.01675017, 0.01466136, 0.01790972, 0.0205...
$ .resid <dbl> -1.1693057, -0.9047053, -0.7439452, 1.8016037, -1.2937083, 1.3483961, -1.1693057, -0.9047053, -0.7439452, -0.66303...
$ .hat <dbl> 7.013561e-05, 4.794678e-05, 5.879536e-05, 6.711744e-05, 9.162739e-05, 7.602458e-05, 7.013561e-05, 4.794678e-05, 5....
$ .sigma <dbl> 1.124879, 1.124884, 1.124886, 1.124861, 1.124876, 1.124874, 1.124879, 1.124884, 1.124886, 1.124887, 1.124860, 1.12...
$ .cooksd <dbl> 1.376354e-05, 4.849628e-06, 3.749311e-06, 5.461011e-05, 2.399355e-05, 2.253792e-05, 1.376354e-05, 4.849628e-06, 3....
$ .std.resid <dbl> -1.1693467, -0.9047270, -0.7439671, 1.8016642, -1.2937676, 1.3484474, -1.1693467, -0.9047270, -0.7439671, -0.66305...
#visualise
ggplot(model.df.1, aes(Year, log_odds, colour = Sex)) +
geom_line() +
geom_smooth(se = TRUE) +
facet_wrap( ~ Foreign)
Which gives:
Upvotes: 2