Reputation: 461
I have predicted values, via:
glm0 <- glm(use ~ as.factor(decision), data = decision_use, family = binomial(link = "logit"))
predicted_glm <- predict(glm0, newdata = decision_use, type = "response", interval = "confidence", se = TRUE)
predict <- predicted_glm$fit
predict <- predict + 1
head(predict)
1 2 3 4 5 6
0.3715847 0.3095335 0.3095335 0.3095335 0.3095335 0.5000000
Now when I plot a box plot using ggplot2,
ggplot(decision_use, aes(x = decision, y = predict)) +
geom_boxplot(aes(fill = factor(decision)), alpha = .2)
I get a box plot with one horizontal line per categorical variable. If you look at the predict data, it's same for each categorical variable, so makes sense.
But I want a box plot with the range. How can I get that? When I use "use" instead of predict, I get boxes stretching from end to end (1 to 0). So I suppose that's not it. Thank you in advance.
To clarify, predicted_glm includes se.fit values. I wonder how to incorporate those.
Upvotes: 0
Views: 541
Reputation: 174393
It doesn't really make sense to do a boxplot here. A boxplot shows the range and spread of a continuous variable within groups. Your dependent variable is binary, so the values are all 0 or 1. Since you are plotting predictions for each group, your plot would have just a single point representing the expected value (i.e. the probability) for each group.
The closest you can come is probably to plot the prediction with 95% confidence bars around it.
You haven't provided any sample data, so I'll make some up here:
set.seed(100)
df <- data.frame(outcome = rbinom(200, 1, c(0.1, 0.9)), var1 = rep(c("A", "B"), 100))
Now we'll create our model and get the prediction for each level of my predictor variable using the newdata
parameter of predict
. I'm going to specify type = "link"
because I want the log odds, and I'm also going to specify se.fit = TRUE
so I can get the standard error of these predictions:
mod <- glm(outcome ~ var1, data = df, family = binomial)
prediction <- predict(mod, list(var1 = c("A", "B")), se.fit = TRUE, type = "link")
Now I can work out the 95% confidence intervals for my predictions:
prediction$lower <- prediction$fit - prediction$se.fit * 1.96
prediction$upper <- prediction$fit + prediction$se.fit * 1.96
Finally, I transform the fit and confidence intervals from log odds into probabilities:
prediction <- lapply(prediction, function(logodds) exp(logodds)/(1 + exp(logodds)))
plotdf <- data.frame(Group = c("A", "B"), fit = prediction$fit,
upper = prediction$upper, lower = prediction$lower)
plotdf
#> Group fit upper lower
#> 1 A 0.13 0.2111260 0.07700412
#> 2 B 0.92 0.9594884 0.84811360
Now I am ready to plot. I will use geom_point
s for the probability estimates and geom_errorbar
s for the confidence intervals :
library(ggplot2)
ggplot(plotdf, aes(x = Group, y = fit, colour = Group)) +
geom_errorbar(aes(ymin = lower, ymax = upper), size = 2, width = 0.5) +
geom_point(size = 3, colour = "black") +
scale_y_continuous(limits = c(0, 1)) +
labs(title = "Probability estimate with 95% CI", y = "Probability")
Created on 2020-05-11 by the reprex package (v0.3.0)
Upvotes: 1