jentila
jentila

Reputation: 35

AER dispersiontest() contradict negative binomial dispersion in R

I am analyzing the Poisson regression of a data count. Poisson requires that the variance and mean is equal, so I am checking the dispersion to ensure this. For the dispersion I am using two methods:

> pm <- glm(myCounts ~ campaign, d, family = poisson)
> summary(pm)

Call:
glm(formula = myCounts ~ campaign, family = poisson, data = d)

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-4.074  -1.599  -0.251   1.636   6.399  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)  4.955870   0.032174  154.03   <2e-16 ***
campaign    -0.025879   0.001716  -15.08   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 428.04  on 35  degrees of freedom
Residual deviance: 195.81  on 34  degrees of freedom
AIC: 426.37

Number of Fisher Scoring iterations: 4

> dispersiontest(pm)

    Overdispersion test

data:  pm
z = 3.1933, p-value = 0.0007032
alternative hypothesis: true dispersion is greater than 1
sample estimates:
dispersion 
   5.53987 

> # Calculate dispersion with Negative Binomial
> nb_reg <- glm.nb(myCounts ~ campaign, data=d)
> summary.glm(nb_reg)

Call:
glm.nb(formula = myCounts ~ campaign, data = d, init.theta = 22.0750109, 
    link = log)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.9235  -0.7083  -0.1776   0.6707   2.4495  

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  4.914728   0.082327  59.697  < 2e-16 ***
campaign    -0.023471   0.003965  -5.919  1.1e-06 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for Negative Binomial(22.075) family taken to be 1.069362)

    Null deviance: 76.887  on 35  degrees of freedom
Residual deviance: 35.534  on 34  degrees of freedom
AIC: 325.76

Number of Fisher Scoring iterations: 1

As you can see, NB provides a 1.069362 dispersion. However, dispersiontest() results on 5.5 with clear overdispersion. If I am not wrong AER is not a parametric test, so we can only know if there is a over/under-dispersion or not. Nevertheless, both methods contradict.

Do somebody know why?

Upvotes: 2

Views: 2296

Answers (2)

Cecilia Ba
Cecilia Ba

Reputation: 1

I think everything worked well, your test says that your data are overdispersed. I fact, p<0.05 means that you fail to accept the null hypothesis, and your hypotheses are:

H0: data are not oberdispersed H1: data are overdispersed

In your case the probability that data are not overdispersed is lower than 0.05. So, you "can say" that your data are overdispersed, which is coherent with your negative binomial output.

Upvotes: 0

StupidWolf
StupidWolf

Reputation: 46888

In glm.nb(), the variance is parameterized as 𝜇+𝜇^2/𝜃 where 𝜇 is your mean (See more at this discussion) and 𝜃 is theta, whereas in a poisson it is ϕ * 𝜇 where ϕ is the dispersion 5.53987 you see.

In negative binomial, the dispersion 1.069362 will not make sense, you need to look at theta inside the Negative Binomial(), which in your case is 22.075. I don't have your data, but using your intercept as a rough estimate of the mean:

mu = exp(4.914728)
theta = 22.0750109
variance = mu + (mu^2)/theta
variance
977.6339
variance / mu
[1] 7.173598

Which gives you something similar to the dispersion you. You should note that the dispersion you have is estimated from the full model, whereas I simply guessed one from your intercept.

Bottom line is the results don't disagree. You can use the negative binomial to model your data.

Below is an example that will illustrate the above relation. We simulate overdispersed data using negative binomial (that's the easiest):

y = c(rnbinom(100,mu=100,size=22),rnbinom(100,mu=200,size=22))
x = rep(0:1,each=100)

AER::dispersiontest(glm(y~x,family=poisson))

    Overdispersion test

data:  glm(y ~ x, family = poisson)
z = 8.0606, p-value = 3.795e-16
alternative hypothesis: true dispersion is greater than 1
sample estimates:
dispersion 
  8.200214 

Roughly, this is derived by dividing the variance in each group over mean in each group:

mean(tapply(y,x,var)/tapply(y,x,mean))
[1] 8.283044

And you can see the dispersion is showing 1, when in fact your data is overdispered:

    summary(MASS::glm.nb(y~x))

Call:
MASS::glm.nb(formula = y ~ x, init.theta = 21.5193965, link = log)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-4.0413  -0.6999  -0.1275   0.5800   2.4774  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  4.66551    0.02364  197.36   <2e-16 ***
x            0.66682    0.03274   20.37   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for Negative Binomial(21.5194) family taken to be 1)

Upvotes: 2

Related Questions