Reputation: 6304
I have multinomial compositional data for 100 categories from two groups, where each is represented by two ages:
set.seed(1)
df <- data.frame(group = c(rep("g1",200),rep("g2",200)),
age = c(rep("a1",100),rep("a2",100),rep("a1",100),rep("a2",100)),
category = rep(paste0("c",1:100),4),
n = c(rmultinom(1,7000,pgamma(shape=0.8,rate=0.1,q=seq(0.01,1,0.01))),
rmultinom(1,5000,pgamma(shape=0.8,rate=0.3,q=seq(0.01,1,0.01))),
rmultinom(1,1800,pgamma(shape=0.5,rate=0.1,q=seq(0.01,1,0.01))),
rmultinom(1,1200,pgamma(shape=0.9,rate=0.1,q=seq(0.01,1,0.01)))),
stringsAsFactors = F)
I want to fit a regression model to estimate the interaction effect of the category * group
, while controlling for age
.
So far, I'm trying to use a multicategorical
glm
(with a binomial(link = 'logit')
), to a data.frame
where I transform the df$n
(total counts) to a binomial
(binary
) form:
binomial.df <- do.call(rbind,lapply(unique(df$group),function(g){
do.call(rbind,lapply(unique(dplyr::filter(df,group == g)$age),function(a){
do.call(rbind,lapply(unique(dplyr::filter(df,group == g)$category),function(t){
sum.non.category <- sum(dplyr::filter(df,group == g & age == a & category != t)$n)
sum.category <- sum(dplyr::filter(df,group == g & age == a & category == t)$n)
data.frame(group = g,age = a,category = t,assigned.category = c(rep(0,sum.non.category),rep(1,sum.category)))
}))
}))
}))
binomial.df$group <- factor(binomial.df$group, levels = c("g1","g2"))
binomial.df$age <- factor(binomial.df$age, levels = c("a1","a2"))
binomial.df$category <- factor(binomial.df$category, levels = paste0("c",1:100))
mm.fit <- glm(assigned.category ~ category * group + age,data = binomial.df, family = binomial(link = 'logit'))
Clearly for this size of data the glm
call will run for days or even longer, so I'm looking for a more tractable way.
Any idea?
BTW, I tried using nnet
's multinom
first:
df$group <- factor(df$group, levels = c("g1","g2"))
df$age <- factor(df$age, levels = c("a1","a2"))
df$category <- factor(df$category, levels = paste0("c",1:100))
mm.fit <- nnet::multinom(n ~ category * group + age, data=df)
But I get:
Error in nnet.default(X, Y, w, mask = mask, size = 0, skip = TRUE, softmax = TRUE, :
too many (22220) weights
Upvotes: 0
Views: 382
Reputation: 7989
The nnet::multinom
issue you can resolve by modifying your multinom
call to include the argument MaxNWts=100000
:
mm.fit <- nnet::multinom(n ~ category * group + age, data=df, MaxNWts=100000)
To fit large multinomial models in R you could also look into the h20
package :
https://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/glm.html
Upvotes: 0