Estimate Dirichlet parameters from multinomial regression

Question

I am trying to use the predicted probabilities from a multinomial regression using multinom function from the nnet package in R to estimate the parameters for a Dirichlet distribution while taking into account the standard errors. Below is an example of what I'm trying to do.

I have two sets of categorical data as show below. As you can see, they have the same ratios but the second dataset has 10 times more observations.

df1 <- data.frame(x=c(rep("A", 100), rep("B", 200), rep("C", 300)))
df2 <- data.frame(x=c(rep("A", 1000), rep("B", 2000), rep("C", 3000)))

I conduct a simple multinomial regression that only uses the intercept.

library(nnet)
mn1 <- multinom(x~1, data = df1)
mn2 <- multinom(x~1, data = df2)

summary(mn1)
summary(mn2)

The predicted probabilities are calculated based on the coefficients. Since I'm only using the intercept, we could simply look at the distribution of the data to calculate the probabilities, of course. But the reason I want to use the multinomial regression is to use the standard errors from the models.

softmax <- function(x) {
  exp(x) / sum(exp(x))
}

pb1 <- softmax(c(0, coef(mn1)[,1]))
pb2 <- softmax(c(0, coef(mn2)[,1]))
pb1; pb2

Then, using the rdirichlet function from the dirmult package, I want to run simulations of the Dirichlet function that would derive from the multinomial regression. Here, I use the predicted probabilities as the alpha parameters for the Dirichlet distribution.

library(dirmult)
set.seed(9999)
dir1 <- as.data.frame(dirmult::rdirichlet(1000, alpha = c(pb1)))
dir2 <- as.data.frame(dirmult::rdirichlet(1000, alpha = c(pb2)))

colnames(dir1) <- c("A", "B", "C")
colnames(dir2) <- c("A", "B", "C")

dir1 <- melt(dir1)
dir2 <- melt(dir2)

As can be seen in the following figures, the Dirichlet distributions are basically the same.

library(ggplot2)
ggplot(dir1, aes(value, color = variable)) +
  geom_density()
ggplot(dir2, aes(value, color = variable)) +
  geom_density()

The problem is that this does not take into account the standard errors of the multinomial regression. In theory, more observations leads to lower standard errors, which should lead to larger alpha parameters. How can I incorporate the standard errors of the multinomial regression into the alpha parameters of the Dirichlet distribution? Is there a better way to estimate the parameters for a Dirichlet distribution based on simple categorical distributions?

The purpose of this is to create simulations of categorical distributions to estimate the confidence intervals of higher-order measures based on the categorical distributions. Let me know if there is another approach that should be used for this. I appreciate the help!

Estimate Dirichlet parameters from multinomial regression

Answers (1)

Related Questions