casanoan
casanoan

Reputation: 27

Multinomial Logistic Regression in R

I have tried to find many resources about multinomial regression. However, I am having a hard time finding visualizations that would show probability of a multiclass response variable given a multiclass dependent variable. I have an example below where I want to predict EQUIPMENT based on WEEKDAY and COUNT. How can I do this using nnet::multinom() and provide a meaningful visualization to help understand predicted classes.

  dummy_data <- function(size){
  LOCATION <- sample(c("LOC_A", "LOC_B", "LOC_B"), size, replace = T, prob = c(0.4, 0.4, 0.2))
  EQUIPMENT <- sample(c("EQUIP_A", "EQUIP_B", "EQUIP_C", "EQUIP_D"), size, replace = TRUE)
  df <- data.frame(LOCATION, EQUIPMENT)
  df$COUNT <- sample(c(1:10), size, replace = TRUE)
  startTime <- as.POSIXct("2016-01-01")
  endTime <- as.POSIXct("2019-01-31")
  df$DATE <- as.Date(sample(seq(startTime, endTime, 1), size))
  df$WEEKDAY <- weekdays(as.Date(df$DATE))

  return(df)
}

df<- dummy_data(100)
library(nnet)
model <- nnet::multinom(EQUIPMENT ~ WEEKDAY + COUNT, data = df)

How should I continue from here? I tried going off of this example and would like a similar visualization to represent probabilities of class broken down by WEEKDAY and EQUIPMENT. I'm having a hard time following this example. Can anyone help? https://stats.idre.ucla.edu/r/dae/multinomial-logistic-regression/

Upvotes: 0

Views: 1272

Answers (1)

hj0602
hj0602

Reputation: 36

I think what you're looking for in the UCLA website example is from "look at the averaged predicted probabilities for different values of the continuous predictor variable write (COUNT in your data) within each level of ses (WEEKDAY in your data)."

In order to do so, I'll add the following lines to what you did before multinom:

    df$EQUIPMENT <- as.factor(df$EQUIPMENT)
    df$EQUIPMENT <- relevel(df$EQUIPMENT, ref = "EQUIP_A")

    df$WEEKDAY <- as.factor(df$WEEKDAY)
    df$WEEKDAY <- relevel(df$WEEKDAY, ref = "Sunday")
   
    model <- nnet::multinom(EQUIPMENT ~ WEEKDAY + COUNT, data = df)

And then follow the example by

    dmodel <- data.frame(WEEKDAY = rep(c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"), each = 10), COUNT = rep(c(1:10),7))
    pp.model <- cbind(dmodel, predict(model, newdata = dmodel, type = "probs", se = TRUE))


    lmodel <- melt(pp.model, id.vars = c("WEEKDAY", "COUNT"), value.name = "probability")

    ggplot(lmodel, aes(x = COUNT, y = probability, colour = WEEKDAY)) + geom_line() + facet_grid(variable ~., scales = "free")

multinom

However, you can see that the figure looks quite different from the UCLA example. Why? Before making the plot, we need to check whether the predictors are significant or not first. To do so simply (rather than following the steps in the example link), I'll use the epiDisplay package:

    epiDisplay::mlogit.display(model, decimal=3, alpha=0.05)

    Outcome =EQUIPMENT; Referent group = EQUIP_A 
                      EQUIP_B                           EQUIP_C                           EQUIP_D                          
                      Coeff./SE     RRR(95%CI)          Coeff./SE     RRR(95%CI)          Coeff./SE     RRR(95%CI)         
    (Intercept)      -0.083/1.0419 -                   0.036/0.9569  -                   -0.124/1.0174 -                  
    WEEKDAYMonday    -15.513/0NA   0(0,0)              -0.246/0.9629 0.782(0.118,5.162)  -0.672/1.0851 0.511(0.061,4.283) 
    WEEKDAYSaturday  -1.082/1.4201 0.339(0.021,5.482)  0.454/1.0477  1.575(0.202,12.275) 0.536/1.1023  1.708(0.197,14.822)
    WEEKDAYSunday    -0.269/1.1276 0.764(0.084,6.967)  -0.234/1.0488 0.791(0.101,6.182)  0.433/1.05    1.542(0.197,12.076)
    WEEKDAYThursday  0.516/1.0964  1.676(0.195,14.372) -1.373/1.3855 0.253(0.017,3.83)   0.296/1.119   1.344(0.15,12.047) 
    WEEKDAYTuesday   0.01/1.083    1.01(0.121,8.438)   -0.548/1.0827 0.578(0.069,4.827)  -0.678/1.1925 0.507(0.049,5.253) 
    WEEKDAYWednesday -1.1/1.1549   0.333(0.035,3.201)  -0.477/0.9758 0.621(0.092,4.201)  -1.794/1.3542 0.166(0.012,2.364) 
    COUNT            0.014/0.109   1.014(0.819,1.256)  0.042/0.0961  1.043(0.864,1.259)  0.021/0.1019  1.021(0.836,1.247) 

    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1  

And there is no significant predictors in your model. On the contrary, the multinomial regression using the example data shows that predictors are significant:

    ml <- read.dta("https://stats.idre.ucla.edu/stat/data/hsbdemo.dta")
    ml$prog2 <- relevel(ml$prog, ref = "academic")

    test <- multinom(prog2 ~ ses + write, data = ml)
    epiDisplay::mlogit.display(test, decimal=3, alpha=0.05)

    Outcome =prog2; Referent group = academic 
               general                            vocation                           
               Coeff./SE       RRR(95%CI)         Coeff./SE        RRR(95%CI)        
   (Intercept) 2.852/1.1664*   -                  5.218/1.1636***  -                 
   sesmiddle   -0.533/0.4437   0.587(0.246,1.4)   0.291/0.4764     1.338(0.526,3.404)
   seshigh     -1.163/0.5142*  0.313(0.114,0.856) -0.983/0.5956    0.374(0.116,1.203)
   write       -0.058/0.0214** 0.944(0.905,0.984) -0.114/0.0222*** 0.893(0.855,0.932)

   Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Upvotes: 2

Related Questions