Reputation: 27
I have tried to find many resources about multinomial regression. However, I am having a hard time finding visualizations that would show probability of a multiclass response variable given a multiclass dependent variable. I have an example below where I want to predict EQUIPMENT based on WEEKDAY and COUNT. How can I do this using nnet::multinom() and provide a meaningful visualization to help understand predicted classes.
dummy_data <- function(size){
LOCATION <- sample(c("LOC_A", "LOC_B", "LOC_B"), size, replace = T, prob = c(0.4, 0.4, 0.2))
EQUIPMENT <- sample(c("EQUIP_A", "EQUIP_B", "EQUIP_C", "EQUIP_D"), size, replace = TRUE)
df <- data.frame(LOCATION, EQUIPMENT)
df$COUNT <- sample(c(1:10), size, replace = TRUE)
startTime <- as.POSIXct("2016-01-01")
endTime <- as.POSIXct("2019-01-31")
df$DATE <- as.Date(sample(seq(startTime, endTime, 1), size))
df$WEEKDAY <- weekdays(as.Date(df$DATE))
return(df)
}
df<- dummy_data(100)
library(nnet)
model <- nnet::multinom(EQUIPMENT ~ WEEKDAY + COUNT, data = df)
How should I continue from here? I tried going off of this example and would like a similar visualization to represent probabilities of class broken down by WEEKDAY and EQUIPMENT. I'm having a hard time following this example. Can anyone help? https://stats.idre.ucla.edu/r/dae/multinomial-logistic-regression/
Upvotes: 0
Views: 1272
Reputation: 36
I think what you're looking for in the UCLA website example is from "look at the averaged predicted probabilities for different values of the continuous predictor variable write (COUNT in your data) within each level of ses (WEEKDAY in your data)."
In order to do so, I'll add the following lines to what you did before multinom:
df$EQUIPMENT <- as.factor(df$EQUIPMENT)
df$EQUIPMENT <- relevel(df$EQUIPMENT, ref = "EQUIP_A")
df$WEEKDAY <- as.factor(df$WEEKDAY)
df$WEEKDAY <- relevel(df$WEEKDAY, ref = "Sunday")
model <- nnet::multinom(EQUIPMENT ~ WEEKDAY + COUNT, data = df)
And then follow the example by
dmodel <- data.frame(WEEKDAY = rep(c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"), each = 10), COUNT = rep(c(1:10),7))
pp.model <- cbind(dmodel, predict(model, newdata = dmodel, type = "probs", se = TRUE))
lmodel <- melt(pp.model, id.vars = c("WEEKDAY", "COUNT"), value.name = "probability")
ggplot(lmodel, aes(x = COUNT, y = probability, colour = WEEKDAY)) + geom_line() + facet_grid(variable ~., scales = "free")
However, you can see that the figure looks quite different from the UCLA example. Why? Before making the plot, we need to check whether the predictors are significant or not first. To do so simply (rather than following the steps in the example link), I'll use the epiDisplay package:
epiDisplay::mlogit.display(model, decimal=3, alpha=0.05)
Outcome =EQUIPMENT; Referent group = EQUIP_A
EQUIP_B EQUIP_C EQUIP_D
Coeff./SE RRR(95%CI) Coeff./SE RRR(95%CI) Coeff./SE RRR(95%CI)
(Intercept) -0.083/1.0419 - 0.036/0.9569 - -0.124/1.0174 -
WEEKDAYMonday -15.513/0NA 0(0,0) -0.246/0.9629 0.782(0.118,5.162) -0.672/1.0851 0.511(0.061,4.283)
WEEKDAYSaturday -1.082/1.4201 0.339(0.021,5.482) 0.454/1.0477 1.575(0.202,12.275) 0.536/1.1023 1.708(0.197,14.822)
WEEKDAYSunday -0.269/1.1276 0.764(0.084,6.967) -0.234/1.0488 0.791(0.101,6.182) 0.433/1.05 1.542(0.197,12.076)
WEEKDAYThursday 0.516/1.0964 1.676(0.195,14.372) -1.373/1.3855 0.253(0.017,3.83) 0.296/1.119 1.344(0.15,12.047)
WEEKDAYTuesday 0.01/1.083 1.01(0.121,8.438) -0.548/1.0827 0.578(0.069,4.827) -0.678/1.1925 0.507(0.049,5.253)
WEEKDAYWednesday -1.1/1.1549 0.333(0.035,3.201) -0.477/0.9758 0.621(0.092,4.201) -1.794/1.3542 0.166(0.012,2.364)
COUNT 0.014/0.109 1.014(0.819,1.256) 0.042/0.0961 1.043(0.864,1.259) 0.021/0.1019 1.021(0.836,1.247)
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
And there is no significant predictors in your model. On the contrary, the multinomial regression using the example data shows that predictors are significant:
ml <- read.dta("https://stats.idre.ucla.edu/stat/data/hsbdemo.dta")
ml$prog2 <- relevel(ml$prog, ref = "academic")
test <- multinom(prog2 ~ ses + write, data = ml)
epiDisplay::mlogit.display(test, decimal=3, alpha=0.05)
Outcome =prog2; Referent group = academic
general vocation
Coeff./SE RRR(95%CI) Coeff./SE RRR(95%CI)
(Intercept) 2.852/1.1664* - 5.218/1.1636*** -
sesmiddle -0.533/0.4437 0.587(0.246,1.4) 0.291/0.4764 1.338(0.526,3.404)
seshigh -1.163/0.5142* 0.313(0.114,0.856) -0.983/0.5956 0.374(0.116,1.203)
write -0.058/0.0214** 0.944(0.905,0.984) -0.114/0.0222*** 0.893(0.855,0.932)
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Upvotes: 2