Reputation: 51
I have a dataset of around 30 values and I would like to know if these data fit a Poisson distribution. I would like to perform a test, like GLM for my data and I found out that they don't follow a Normal distribution. One of my guesses is that they follow a Poisson distribution, but I need to make sure that it is true.
Upvotes: 4
Views: 13242
Reputation: 174293
You could try a dispersion test, which relies on the fact that the Poisson distribution's mean is equal to its variance, and the the ratio of the variance to the mean in a sample of n counts from a Poisson distribution should follow a Chi-square distribution with n-1 degrees of freedom.
You could implement in R like this:
dispersion_test <- function(x)
{
res <- 1-2 * abs((1 - pchisq((sum((x - mean(x))^2)/mean(x)), length(x) - 1))-0.5)
cat("Dispersion test of count data:\n",
length(x), " data points.\n",
"Mean: ",mean(x),"\n",
"Variance: ",var(x),"\n",
"Probability of being drawn from Poisson distribution: ",
round(res, 3),"\n", sep = "")
invisible(res)
}
This allows you to reject the null hypothesis that your data are Poisson distributed if the p value is <0.05. If the p value is above 0.05, you could accept that the data followed a Poisson distribution.
Suppose I have the following data:
set.seed(1)
x <- rpois(30, 1)
x
# [1] 0 1 1 2 0 2 3 1 1 0 0 0 1 1 2 1 1 4 1 2 3 0 1 0 0 1 0 1 2 0
Then I can just do:
dispersion_test(x)
# Dispersion test of count data:
# 30 data points.
# Mean: 1.066667
# Variance: 1.098851
# Probability of being drawn from Poisson distribution: 0.841
A word of warning however. With a sample size as small as 30, one cannot say with any confidence that your data are Poisson distributed. If my next data point turned out to be a 7, then p value would fall below 0.05 and I would have to reject the null hypothesis that my data were Poisson distributed.
Upvotes: 5