biostatguy12
biostatguy12

Reputation: 659

function for odds ratio and/or relative risk calculation given list of model summary data in r or sas?

I have computed a multinomial regression model in R. However, for reasons out of scope, I would like to compute odds ratios/RR (with confidence intervals) on a categorical interaction term at all margins of the interaction with summary() data (such as the list of coefficients and standard errors etc.). At this point I would be willing to do it in either R or SAS. To illustrate my point:

emmeans(model, var1|var2)

is the only thing I can see where model is a model object, but instead is there a function where I can act on a data frame output by df<-summary(model)?:

want:

emmeans(df, var1|var2)

At this point I'd be willing to extract df as an excel file and analyzing in SAS if there is such a way, so any experts are welcome to help. It should be possible because I have the coefficients, terms, and standard errors to compute OR/RR, I think.

Thanks!

Upvotes: 1

Views: 400

Answers (1)

Russ Lenth
Russ Lenth

Reputation: 6780

Here's an example that may be suitable. It's adapted from an example for nnet::multinom

Data and model:

library(nnet)
library(MASS)
library(emmeans)

bwt <- with(birthwt, {
    race <- factor(race, labels = c("white", "black", "other"))
    ptd <- factor(ptl > 0)
    ftv <- factor(ftv)
    levels(ftv)[-(1:2)] <- "2+"
    data.frame(low = factor(low), age, lwt, race, smoke = (smoke > 0),
               ptd, ht = (ht > 0), ui = (ui > 0), ftv)
})

bwt.mu <- multinom(low ~ ., bwt)
## # weights:  12 (11 variable)
## initial  value 131.004817 
## iter  10 value 98.029803
## final  value 97.737759 
## converged

Here are the estimated marginal means for low (the response) given race. These are on the "prob" scale by default.

(EMM <- emmeans(bwt.mu, ~ low | race))
## race = white:
##  low  prob     SE df lower.CL upper.CL
##  0   0.466 0.0827 11   0.2837    0.648
##  1   0.534 0.0827 11   0.3523    0.716
## 
## race = black:
##  low  prob     SE df lower.CL upper.CL
##  0   0.262 0.0769 11   0.0932    0.432
##  1   0.738 0.0769 11   0.5682    0.907
## 
## race = other:
##  low  prob     SE df lower.CL upper.CL
##  0   0.334 0.0726 11   0.1747    0.494
##  1   0.666 0.0726 11   0.5059    0.825
## 
## Results are averaged over the levels of: smoke, ptd, ht, ui, ftv 
## Confidence level used: 0.95

I'm not sure what odds ratios you want, but the route to getting odds ratios is to re-scale everything to the logit scale.

REMM <- regrid(EMM, "logit")

Now we can get odds ratios for the responses:

pairs(REMM, type = "response")
## race = white:
##  contrast    odds.ratio    SE df null t.ratio p.value
##  low0 / low1      0.760 0.505 11    1  -0.414  0.6871
## 
## race = black:
##  contrast    odds.ratio    SE df null t.ratio p.value
##  low0 / low1      0.127 0.101 11    1  -2.600  0.0247
## 
## race = other:
##  contrast    odds.ratio    SE df null t.ratio p.value
##  low0 / low1      0.253 0.165 11    1  -2.111  0.0585
## 
## Results are averaged over the levels of: smoke, ptd, ht, ui, ftv 
## Tests are performed on the log odds ratio scale

Or we can turn the tables and get them for comparing the races:

pairs(REMM, by = "low", type = "response")
## low = 0:
##  contrast      odds.ratio    SE df null t.ratio p.value
##  white / black      2.449 1.010 11    1   2.172  0.1204
##  white / other      1.734 0.602 11    1   1.587  0.2914
##  black / other      0.708 0.301 11    1  -0.812  0.7034
## 
## low = 1:
##  contrast      odds.ratio    SE df null t.ratio p.value
##  white / black      0.408 0.168 11    1  -2.172  0.1204
##  white / other      0.577 0.200 11    1  -1.587  0.2914
##  black / other      1.412 0.599 11    1   0.812  0.7034
## 
## Results are averaged over the levels of: smoke, ptd, ht, ui, ftv 
## P value adjustment: tukey method for comparing a family of 3 estimates 
## Tests are performed on the log odds ratio scale

In the latter, since there are only two multinomial categories, the second set of odds ratios is the reciprocal of the respective ones in the first set. So really only 3 tests are done, not six. And in the first set, we really get the squares of the odds ratios -- for example, the odds ratio for race = white is not .466/.534 as we might expect, but rather (.466/.534) / (.534/.466). Dependencies like this make it a little tricky to create and interpret sensible odds ratios with multinomial models.

Created on 2022-09-11 by the reprex package (v2.0.1)

Afterthought

I think maybe it makes more sense to regrid to the log scale rather than logit...

LEMM <- regrid(EMM, "log")

pairs(LEMM, type = "response")
## race = white:
##  contrast    ratio    SE df null t.ratio p.value
##  low0 / low1 0.872 0.290 11    1  -0.414  0.6871
## 
## race = black:
##  contrast    ratio    SE df null t.ratio p.value
##  low0 / low1 0.356 0.141 11    1  -2.600  0.0247
## 
## race = other:
##  contrast    ratio    SE df null t.ratio p.value
##  low0 / low1 0.503 0.164 11    1  -2.111  0.0585
## 
## Results are averaged over the levels of: smoke, ptd, ht, ui, ftv 
## Tests are performed on the log scale

The above are actually the odds ratios rather than their squares because we are back-transforming log(p) - log(1-p) = logit(p).

Going the other way, we obtain ratios of probabilities:

pairs(LEMM, by = "low", type = "response")
## low = 0:
##  contrast      ratio    SE df null t.ratio p.value
##  white / black 1.774 0.498 11    1   2.044  0.1477
##  white / other 1.392 0.293 11    1   1.572  0.2976
##  black / other 0.785 0.238 11    1  -0.800  0.7109
## 
## low = 1:
##  contrast      ratio    SE df null t.ratio p.value
##  white / black 0.724 0.109 11    1  -2.134  0.1280
##  white / other 0.803 0.116 11    1  -1.522  0.3188
##  black / other 1.108 0.138 11    1   0.827  0.6947
## 
## Results are averaged over the levels of: smoke, ptd, ht, ui, ftv 
## P value adjustment: tukey method for comparing a family of 3 estimates 
## Tests are performed on the log scale

Upvotes: 1

Related Questions