kaseyzapatka
kaseyzapatka

Reputation: 169

ensure same N across nested multinomial models

I'm running a set of nested models and need to ensure all models have the same N and use the same observations.

In Stata, I could run the model with the smallest N, generate a flag of the observations used in the model using e(sample) command, and then use that as a filter when I run the regression. Does anyone know who to do this in R? Using tidy() or augment() doesn't seem to work for multinomial models using the multinom package.

Thanks for the help. Sample code using palmerpenguins package:

install.packages("palmerpenguins")
library(palmerpenguins)

head(penguins)
View(penguins)

# models 
multinom_islands <- list(
  "model 1"  = nnet::multinom(island ~ bill_length_mm + bill_depth_mm,
                              data = penguins, trace = FALSE, na.action = "na.exclude"),
  "model 2"  = nnet::multinom(island ~ bill_length_mm + bill_depth_mm + sex + year, 
                              data = penguins, trace = FALSE, na.action = "na.exclude"))


# tables 
modelsummary(multinom_islands, group = y.level + term ~ model, 
             exponentiate = TRUE, 
             stars = c("*"=.05, "**"=.01, "***"=0.001), 
             title = "Nested multinomial regression models predicting island",
             output = "markdown")

Upvotes: 0

Views: 75

Answers (1)

Vincent
Vincent

Reputation: 17725

The solution you describe is Stata would not be guaranteed to work unless the variables in the “small” model are a subset of the variables in the “large” model.

One way to be sure you get the expected results is to begin by creating a new dataset with all the variables from all models. Then, we use the na.omit function to eliminate all missing data. Finally, we estimate the models. This does not guarantee the largest possible sample size, but it guarantees that all models will have the same number of observations:

library(modelsummary)
library(palmerpenguins)
library(nnet)

variables <- c("island", "bill_length_mm", "bill_depth_mm", "sex", "year")
dat <- penguins[, variables]
dat <- na.omit(dat)

mod <- list(
  multinom(island ~ bill_length_mm + bill_depth_mm,
           data = dat, trace = FALSE),
  multinom(island ~ bill_length_mm + bill_depth_mm + sex + year,
           data = dat, trace = FALSE))

modelsummary(mod, group = y.level + term ~ model, output = "markdown")
Model 1 Model 2
Dream (Intercept) -16.087 0.008
(2.340) (0.000)
bill_length_mm 0.008 0.096
(0.027) (0.032)
bill_depth_mm 0.899 1.219
(0.104) (0.116)
sexmale -2.202
(0.286)
year -0.012
(0.001)
Torgersen (Intercept) -9.346 -0.006
(3.297) (0.000)
bill_length_mm -0.209 -0.149
(0.047) (0.049)
bill_depth_mm 0.966 1.133
(0.157) (0.155)
sexmale -1.302
(0.171)
year -0.007
(0.002)
Num.Obs. 333 333
AIC 482.9 458.4
edf 6.000 10.000

Upvotes: 1

Related Questions