Reputation: 659
I have computed a multinomial regression model in R. However, for reasons out of scope, I would like to compute odds ratios/RR (with confidence intervals) on a categorical interaction term at all margins of the interaction with summary() data (such as the list of coefficients and standard errors etc.). At this point I would be willing to do it in either R or SAS. To illustrate my point:
emmeans(model, var1|var2)
is the only thing I can see where model
is a model object, but instead is there a function where I can act on a data frame output by df<-summary(model)?:
want:
emmeans(df, var1|var2)
At this point I'd be willing to extract df as an excel file and analyzing in SAS if there is such a way, so any experts are welcome to help. It should be possible because I have the coefficients, terms, and standard errors to compute OR/RR, I think.
Thanks!
Upvotes: 1
Views: 400
Reputation: 6780
Here's an example that may be suitable. It's adapted from an example for nnet::multinom
Data and model:
library(nnet)
library(MASS)
library(emmeans)
bwt <- with(birthwt, {
race <- factor(race, labels = c("white", "black", "other"))
ptd <- factor(ptl > 0)
ftv <- factor(ftv)
levels(ftv)[-(1:2)] <- "2+"
data.frame(low = factor(low), age, lwt, race, smoke = (smoke > 0),
ptd, ht = (ht > 0), ui = (ui > 0), ftv)
})
bwt.mu <- multinom(low ~ ., bwt)
## # weights: 12 (11 variable)
## initial value 131.004817
## iter 10 value 98.029803
## final value 97.737759
## converged
Here are the estimated marginal means for low
(the response) given race
. These are on the "prob" scale by default.
(EMM <- emmeans(bwt.mu, ~ low | race))
## race = white:
## low prob SE df lower.CL upper.CL
## 0 0.466 0.0827 11 0.2837 0.648
## 1 0.534 0.0827 11 0.3523 0.716
##
## race = black:
## low prob SE df lower.CL upper.CL
## 0 0.262 0.0769 11 0.0932 0.432
## 1 0.738 0.0769 11 0.5682 0.907
##
## race = other:
## low prob SE df lower.CL upper.CL
## 0 0.334 0.0726 11 0.1747 0.494
## 1 0.666 0.0726 11 0.5059 0.825
##
## Results are averaged over the levels of: smoke, ptd, ht, ui, ftv
## Confidence level used: 0.95
I'm not sure what odds ratios you want, but the route to getting odds ratios is to re-scale everything to the logit scale.
REMM <- regrid(EMM, "logit")
Now we can get odds ratios for the responses:
pairs(REMM, type = "response")
## race = white:
## contrast odds.ratio SE df null t.ratio p.value
## low0 / low1 0.760 0.505 11 1 -0.414 0.6871
##
## race = black:
## contrast odds.ratio SE df null t.ratio p.value
## low0 / low1 0.127 0.101 11 1 -2.600 0.0247
##
## race = other:
## contrast odds.ratio SE df null t.ratio p.value
## low0 / low1 0.253 0.165 11 1 -2.111 0.0585
##
## Results are averaged over the levels of: smoke, ptd, ht, ui, ftv
## Tests are performed on the log odds ratio scale
Or we can turn the tables and get them for comparing the races:
pairs(REMM, by = "low", type = "response")
## low = 0:
## contrast odds.ratio SE df null t.ratio p.value
## white / black 2.449 1.010 11 1 2.172 0.1204
## white / other 1.734 0.602 11 1 1.587 0.2914
## black / other 0.708 0.301 11 1 -0.812 0.7034
##
## low = 1:
## contrast odds.ratio SE df null t.ratio p.value
## white / black 0.408 0.168 11 1 -2.172 0.1204
## white / other 0.577 0.200 11 1 -1.587 0.2914
## black / other 1.412 0.599 11 1 0.812 0.7034
##
## Results are averaged over the levels of: smoke, ptd, ht, ui, ftv
## P value adjustment: tukey method for comparing a family of 3 estimates
## Tests are performed on the log odds ratio scale
In the latter, since there are only two multinomial categories, the second set of odds ratios is the reciprocal of the respective ones in the first set. So really only 3 tests are done, not six. And in the first set, we really get the squares of the odds ratios -- for example, the odds ratio for race = white
is not .466/.534 as we might expect, but rather (.466/.534) / (.534/.466). Dependencies like this make it a little tricky to create and interpret sensible odds ratios with multinomial models.
Created on 2022-09-11 by the reprex package (v2.0.1)
I think maybe it makes more sense to regrid to the log scale rather than logit...
LEMM <- regrid(EMM, "log")
pairs(LEMM, type = "response")
## race = white:
## contrast ratio SE df null t.ratio p.value
## low0 / low1 0.872 0.290 11 1 -0.414 0.6871
##
## race = black:
## contrast ratio SE df null t.ratio p.value
## low0 / low1 0.356 0.141 11 1 -2.600 0.0247
##
## race = other:
## contrast ratio SE df null t.ratio p.value
## low0 / low1 0.503 0.164 11 1 -2.111 0.0585
##
## Results are averaged over the levels of: smoke, ptd, ht, ui, ftv
## Tests are performed on the log scale
The above are actually the odds ratios rather than their squares because we are back-transforming log(p) - log(1-p) = logit(p).
Going the other way, we obtain ratios of probabilities:
pairs(LEMM, by = "low", type = "response")
## low = 0:
## contrast ratio SE df null t.ratio p.value
## white / black 1.774 0.498 11 1 2.044 0.1477
## white / other 1.392 0.293 11 1 1.572 0.2976
## black / other 0.785 0.238 11 1 -0.800 0.7109
##
## low = 1:
## contrast ratio SE df null t.ratio p.value
## white / black 0.724 0.109 11 1 -2.134 0.1280
## white / other 0.803 0.116 11 1 -1.522 0.3188
## black / other 1.108 0.138 11 1 0.827 0.6947
##
## Results are averaged over the levels of: smoke, ptd, ht, ui, ftv
## P value adjustment: tukey method for comparing a family of 3 estimates
## Tests are performed on the log scale
Upvotes: 1