hy9fesh
hy9fesh

Reputation: 661

How do I create a predicted probabilities graph with confidence intervals for a multinomial logistic regression with an interaction term in R?

Is there an easier way to create a predicted probabilities graph with confidence intervals for a multinomial logistic regression with an interaction term? I have the following code below, which seems like a very long way to do something that I could just do with one line of code for other regression types. I'm also wondering if creating three graphs for each of my y levels if the best way to represent the results.

Also, I was wondering if there's a way to make the geom_ribbon to look smoother.

# data frame
df <- data.frame(
  rating = c("1 Better", "1 Better", "1 Better", "2 Medium", "2 Medium", "2 Medium", "3 Worse", "3 Worse", "3 Worse","1 Better", "1 Better", "1 Better", "2 Medium", "2 Medium", "2 Medium", "3 Worse", "3 Worse", "3 Worse"),
  count = c(2, 0, 5, 8, 10, 3, 2, 1, 0, 0, 9, 1, 0, 5, 7, 2, 9, 0),
  case = c("Y", "N", "Y", "Y", "N", "Y", "N", "Y", "N", "N", "Y", "N", "Y", "N", "N", "Y", "N", "Y"),
  cool = c(1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1)
)

# regression model
fit <- multinom(rating ~ count * case + cool, data = df)

It appears that the next step would be to generate data for the prediction.

# new data for prediction
new_data <- expand.grid(
  count = seq(min(df$count), max(df$count), length.out = 100),
  case = unique(df$case),
  cool = unique(df$cool)
)

# simulations
n_sims <- 1000

# simulate predictions
simulate_preds <- function(model, newdata, n) {
  preds <- predict(model, newdata = newdata, type = "probs")
  simulated_preds <- replicate(n, {
    beta_sims <- matrix(rnorm(length(preds), 0, 0.1), ncol = nrow(preds))
    preds + t(beta_sims)
  })
  simulated_preds
}

simulated_probs <- simulate_preds(fit, newdata = new_data, n = n_sims)

# confidence intervals
quantiles <- apply(simulated_probs, c(1, 2), function(x) {
  quantile(x, c(0.025, 0.975))
})

So when I combine the quantiles, I would have three separate sets of upper and lower bounds:

# combine to create new df
new_data <- cbind(new_data, quantiles[1,,], quantiles[2,,])

# rename columns
colnames(new_data) <- c("count", "case", "cool", "Better_Low", "Medium_Low", "Worse_Low", "Better_High", "Medium_High", "Worse_High")

Here's my graph:

ggplot(new_data, aes(x = count, color = factor(cool))) +
  facet_wrap(~ yes) +
  geom_ribbon(aes(ymin = Better_Low, ymax = Better_High)) +
  labs(x = "Count", y = "Predicted Probability", color = "Cool", linetype = "Case") +
  ggtitle("Predicted Probabilities by Rating, Case, and Coolness with Confidence Intervals") +
  theme_bw()

And I repeat this for medium and worse...

But the I add the stat and method commands for the geom_ribbon function, it throws an error:

ggplot(new_data, aes(x = count, color = factor(cool))) +
  facet_wrap(~ yes) +
  geom_ribbon(aes(ymin = Better_Low, ymax = Better_High), alpha = 0.3, stat = "smooth", method = "loess") +
  labs(x = "Count", y = "Predicted Probability", color = "Cool", linetype = "Case") +
  ggtitle("Predicted Probabilities by Rating, Case, and Coolness with Confidence Intervals") +
  theme_bw()

`geom_smooth()` using formula = 'y ~ x'
Error in `geom_ribbon()`:
! Problem while computing stat.
ℹ Error occurred in the 1st layer.
Caused by error in `compute_layer()`:
! `stat_smooth()` requires the following missing aesthetics: y
Run `rlang::last_trace()` to see where the error occurred.

Upvotes: 0

Views: 795

Answers (2)

Daniel
Daniel

Reputation: 7832

Note that the confidence bands are out of the plausible range for probabilities (e.g., below 0% and above 100%). You could also try following, which returns the same point estimates, but intervals are inside plausible bounds (another disclaimer: I am the ggeffects author ;-)):

library(ggeffects)
library(nnet)

df <- data.frame(
  rating = c("1 Better", "1 Better", "1 Better", "2 Medium", "2 Medium", "2 Medium", "3 Worse", "3 Worse", "3 Worse", "1 Better", "1 Better", "1 Better", "2 Medium", "2 Medium", "2 Medium", "3 Worse", "3 Worse", "3 Worse"),
  count = c(2, 0, 5, 8, 10, 3, 2, 1, 0, 0, 9, 1, 0, 5, 7, 2, 9, 0),
  case = c("Y", "N", "Y", "Y", "N", "Y", "N", "Y", "N", "N", "Y", "N", "Y", "N", "N", "Y", "N", "Y"),
  cool = c(1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1)
)

fit <- multinom(rating ~ count * case + cool, data = df, trace = FALSE)

# original data for count, a bit ugly plot
ggemmeans(fit, c("count", "case")) |> plot()


# using more values for a smoother plot
ggemmeans(fit, c("count [1:10 by=0.1]", "case")) |> plot()

Created on 2024-02-24 with reprex v2.1.0

Upvotes: 1

Vincent
Vincent

Reputation: 17805

You could use the marginaleffects package for this. Your example data is too small and weird to produce reasonable-looking plots, but the code below shows the results anyway. If you want to learn more, you can read these two vignettes:

library(marginaleffects)
library(nnet)

df <- data.frame(
  rating = c("1 Better", "1 Better", "1 Better", "2 Medium", "2 Medium", "2 Medium", "3 Worse", "3 Worse", "3 Worse","1 Better", "1 Better", "1 Better", "2 Medium", "2 Medium", "2 Medium", "3 Worse", "3 Worse", "3 Worse"),
  count = c(2, 0, 5, 8, 10, 3, 2, 1, 0, 0, 9, 1, 0, 5, 7, 2, 9, 0),
  case = c("Y", "N", "Y", "Y", "N", "Y", "N", "Y", "N", "N", "Y", "N", "Y", "N", "N", "Y", "N", "Y"),
  cool = c(1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1)
)

fit <- multinom(rating ~ count * case + cool, data = df, trace = FALSE)

plot_predictions(fit,
    type = "probs",
    condition = c("count", "case", "group")
)

Disclaimer: I am the marginaleffects author.

Upvotes: 3

Related Questions