Reputation: 35
I am analyzing the Poisson regression of a data count. Poisson requires that the variance and mean is equal, so I am checking the dispersion to ensure this. For the dispersion I am using two methods:
> pm <- glm(myCounts ~ campaign, d, family = poisson)
> summary(pm)
Call:
glm(formula = myCounts ~ campaign, family = poisson, data = d)
Deviance Residuals:
Min 1Q Median 3Q Max
-4.074 -1.599 -0.251 1.636 6.399
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 4.955870 0.032174 154.03 <2e-16 ***
campaign -0.025879 0.001716 -15.08 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 428.04 on 35 degrees of freedom
Residual deviance: 195.81 on 34 degrees of freedom
AIC: 426.37
Number of Fisher Scoring iterations: 4
> dispersiontest(pm)
Overdispersion test
data: pm
z = 3.1933, p-value = 0.0007032
alternative hypothesis: true dispersion is greater than 1
sample estimates:
dispersion
5.53987
> # Calculate dispersion with Negative Binomial
> nb_reg <- glm.nb(myCounts ~ campaign, data=d)
> summary.glm(nb_reg)
Call:
glm.nb(formula = myCounts ~ campaign, data = d, init.theta = 22.0750109,
link = log)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.9235 -0.7083 -0.1776 0.6707 2.4495
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.914728 0.082327 59.697 < 2e-16 ***
campaign -0.023471 0.003965 -5.919 1.1e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for Negative Binomial(22.075) family taken to be 1.069362)
Null deviance: 76.887 on 35 degrees of freedom
Residual deviance: 35.534 on 34 degrees of freedom
AIC: 325.76
Number of Fisher Scoring iterations: 1
As you can see, NB provides a 1.069362 dispersion. However, dispersiontest() results on 5.5 with clear overdispersion. If I am not wrong AER is not a parametric test, so we can only know if there is a over/under-dispersion or not. Nevertheless, both methods contradict.
Do somebody know why?
Upvotes: 2
Views: 2296
Reputation: 1
I think everything worked well, your test says that your data are overdispersed. I fact, p<0.05 means that you fail to accept the null hypothesis, and your hypotheses are:
H0: data are not oberdispersed H1: data are overdispersed
In your case the probability that data are not overdispersed is lower than 0.05. So, you "can say" that your data are overdispersed, which is coherent with your negative binomial output.
Upvotes: 0
Reputation: 46888
In glm.nb(), the variance is parameterized as 𝜇+𝜇^2/𝜃 where 𝜇 is your mean (See more at this discussion) and 𝜃 is theta, whereas in a poisson it is ϕ * 𝜇
where ϕ is the dispersion 5.53987 you see.
In negative binomial, the dispersion 1.069362
will not make sense, you need to look at theta inside the Negative Binomial(), which in your case is 22.075. I don't have your data, but using your intercept as a rough estimate of the mean:
mu = exp(4.914728)
theta = 22.0750109
variance = mu + (mu^2)/theta
variance
977.6339
variance / mu
[1] 7.173598
Which gives you something similar to the dispersion you. You should note that the dispersion you have is estimated from the full model, whereas I simply guessed one from your intercept.
Bottom line is the results don't disagree. You can use the negative binomial to model your data.
Below is an example that will illustrate the above relation. We simulate overdispersed data using negative binomial (that's the easiest):
y = c(rnbinom(100,mu=100,size=22),rnbinom(100,mu=200,size=22))
x = rep(0:1,each=100)
AER::dispersiontest(glm(y~x,family=poisson))
Overdispersion test
data: glm(y ~ x, family = poisson)
z = 8.0606, p-value = 3.795e-16
alternative hypothesis: true dispersion is greater than 1
sample estimates:
dispersion
8.200214
Roughly, this is derived by dividing the variance in each group over mean in each group:
mean(tapply(y,x,var)/tapply(y,x,mean))
[1] 8.283044
And you can see the dispersion is showing 1, when in fact your data is overdispered:
summary(MASS::glm.nb(y~x))
Call:
MASS::glm.nb(formula = y ~ x, init.theta = 21.5193965, link = log)
Deviance Residuals:
Min 1Q Median 3Q Max
-4.0413 -0.6999 -0.1275 0.5800 2.4774
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 4.66551 0.02364 197.36 <2e-16 ***
x 0.66682 0.03274 20.37 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for Negative Binomial(21.5194) family taken to be 1)
Upvotes: 2