Reputation: 1082
I'm probably just missing something silly here but I can't seem to manually replicate the predicted values from this model. I'm following this example
library('foreign')
library('nnet')
library('tidyverse')
ml <- read.dta("https://stats.idre.ucla.edu/stat/data/hsbdemo.dta")
ml = ml %>%
mutate(prog2 = fct_relevel(prog, "academic"))
# Fit a very basic model of the students choice of program
# as a function of their socioeconmic status and writing score:
test <- multinom(prog2 ~ ses + write, data = ml)
summary(test)
# If we wanted to calculate the probability of a high SES student
# with a median writing score picking a vocational program,
# we should be able to do this:
coef = summary(test)$coefficients[2, c(1, 3:4)]
log_odds = sum(coef * c(1, 1, median(ml$write)))
prob = exp(log_odds)/(1 + exp(log_odds))
prob
# from preditions:
ml %>%
bind_cols(as_tibble(predict(test, type = 'probs'))) %>%
filter(ses == 'high', write == median(write))
I'm getting 13.0% from my manual calculation and the predict function gives 10.8%. What did I miss?
Upvotes: 2
Views: 547
Reputation: 8019
The prediction of a multinomial logistic model on the link & response scale can be obtained as follows (key is that the inverse link function for multinomial is the softmax function, not the inverse logit) :
library('foreign')
library('nnet')
library('tidyverse')
ml <- read.dta("https://stats.idre.ucla.edu/stat/data/hsbdemo.dta")
ml = ml %>%
mutate(prog2 = fct_relevel(prog, "academic"))
# Fit a very basic model of the students choice of program
# as a function of their socioeconmic status and writing score:
fit <- multinom(prog2 ~ ses + write, data = ml)
summary(fit)
# model predictions on link scale
X <- as.matrix(data.frame('(Intercept)'=1, sesmiddle=0, seshigh=1, write=median(ml$write), check.names=F))
# or X <- model.matrix(fit) to use model matrix
betahat <- t(rbind(0, coef(fit))) # model coefficients, with expicit zero row added for reference category & transposed
preds_link <- X %*% betahat # predictions on the link scale (with explicit zero for reference level included here, sometimes this level is dropped)
colnames(preds_link) <- fit$lev
# model prediction on response / probability scale
softMax <- function(eta){ # softmax function = inverse link function for multinomial
exp_eta <- exp(eta)
return(sweep(exp_eta, 1, STATS=rowSums(exp_eta), FUN="/"))
}
preds_response <- softMax(preds_link)
preds_response
# academic general vocation
# [1,] 0.721014 0.1710377 0.1079482
# this matches
ml %>%
bind_cols(as_tibble(predict(fit, type = 'probs'))) %>%
filter(ses == 'high', write == median(write))
# id female ses schtyp prog read write math science socst honors awards cid prog2 academic
# 71 90 female high public academic 42 54 50 50 52 not enrolled 1 8 academic 0.721014
# 92 130 female high public general 43 54 55 55 46 not enrolled 1 10 general 0.721014
# 140 97 male high public academic 60 54 58 58 61 not enrolled 1 14 academic 0.721014
# 157 96 female high public academic 65 54 61 58 56 not enrolled 1 16 academic 0.721014
# general vocation
# 71 0.1710377 0.1079482
# 92 0.1710377 0.1079482
# 140 0.1710377 0.1079482
# 157 0.1710377 0.1079482
Upvotes: 0