nickbloom
nickbloom

Reputation: 36

Why is the likelihood/AIC of my poisson regression infinite?

I am trying to evaluate themodel fit of several regressions in R, and I have run into a problem I have had multiple times now: the log-likelihood of my Poisson regression is infinite.

I'm using a non-integer dependent variable (Note: I know what I'm doing in this regard), and I'm wondering if maybe that's the problem. However, I don't get an infinite log-likelihood when running the regression with glm.nb.

Code to reproduce the issue is below.

Edit: the problem appears to go away when I coerce the DV to integer. Any idea how to get log likelihood from Poissons with non-integer DVs?

# Input Data
so_data <- data.frame(dv = c(21.0552722691125, 24.3061351414885, 7.84658638053276, 
25.0294679770848, 15.8064731063311, 10.8171744654056, 31.3008088413026, 
2.26643928259238, 18.4261153345417, 5.62915828161753, 17.0691184593063, 
1.11959635820499, 30.0154935602592, 23.0000809735738, 28.4389825676123, 
27.7678405415711, 23.7108405071757, 23.5070651053276, 14.2534787168392, 
15.2058525068363, 19.7449094187771, 2.52384709295823, 29.7081691356397, 
32.4723790240354, 19.2147002673637, 61.7911384519901, 10.5687170234821, 
23.9047421013736, 18.4889651451222, 13.0360878554798, 15.1752866581849, 
11.5205948111817, 31.3539840929108, 31.7255952728076, 25.3034625215724, 
5.00013988265465, 30.2037887018226, 1.86123112349445, 3.06932041603219, 
22.6739418581257, 6.33738321053804, 24.2933951601142, 14.8634827414491, 
31.8302947881089, 34.8361908525564, 1.29606416941288, 13.206844629927, 
28.843579313401, 25.8024295609021, 14.4414831628722, 18.2109680632694, 
14.7092063453463, 10.0738043919183, 28.4124482962025, 27.1004208775326, 
1.31350378236957, 14.3009307888745, 1.32555197766214, 2.70896028922312, 
3.88043749517381, 3.79492216916016, 19.4507965653633, 32.1689088941444, 
2.61278585713499, 41.6955885902228, 2.13466761675063, 30.4207256294235, 
24.8231524369244, 20.7605955978196, 17.2182798298094, 2.11563574288652, 
12.290778250655, 0.957467139696772, 16.1775287334746))

# Run Model
p_mod <- glm(dv ~ 1, data = so_data, family = poisson(link = 'log'))

# Be Confused
logLik(p_mod)

Upvotes: 2

Views: 6977

Answers (2)

Eran
Eran

Reputation: 555

Poisson log-likelihood involves calculating log(factorial(x)) (https://www.statlect.com/fundamentals-of-statistics/Poisson-distribution-maximum-likelihood). For values larger than 30 it has to be done using Stirling's approximation formula in order to avoid exceeding the limit of computer arithmetic. Sample code in Python:

# define a likelihood function. https://www.statlect.com/fundamentals-of- statistics/Poisson-distribution-maximum-likelihood
def loglikelihood_f(lmba, x):
    #Using Stirling formula to avoid calculation of factorial.
    #logfactorial(n) = n*ln(n) - n
    n = x.size
    logfactorial = x*np.log(x+0.001) - x #np.log(factorial(x))
    logfactorial[logfactorial == -inf] = 0
    result =\
        - np.sum(logfactorial) \
        - n * lmba \
        + np.log(lmba) * np.sum(x)
    return result

Upvotes: 0

Ben Bolker
Ben Bolker

Reputation: 226182

Elaborating on @ekstroem's comment: the Poisson distribution is only supported over the non-negative integers (0, 1, ...). So, technically speaking, the probability of any non-integer value is zero -- although R does allow for a little bit of fuzz, to allow for round-off/floating-point representation issues:

> dpois(1,lambda=1)
[1] 0.3678794
> dpois(1.1,lambda=1)
[1] 0
Warning message:
In dpois(1.1, lambda = 1) : non-integer x = 1.100000
> dpois(1+1e-7,lambda=1)  ## fuzz
[1] 0.3678794

It is theoretically possible to compute something like a Poisson log-likelihood for non-integer values:

my_dpois <- function(x,lambda,log=FALSE) {
   LL <- -lambda+x*log(lambda)-lfactorial(x)
   if (log) LL else exp(LL)
}

but I would be very careful - some quick tests with integrate suggest it integrates to 1 (after I fixed the bug in it), but I haven't checked more carefully that this is really a well-posed probability distribution. (On the other hand, some reasonable-seeming posts on CrossValidated suggest that it's not insane ...)

You say "I know what I'm doing in this regard"; can you give some more of the context? Some alternative possibilities (although this is steering into CrossValidated territory) -- the best answer depends on where your data really come from (i.e., why you have "count-like" data that are non-integer but you think should be treated as Poisson).

  • a quasi-Poisson model (family=quasipoisson). (R will still not give you log-likelihood or AIC values in this case, because technically they don't exist -- you're supposed to do inference on the basis of the Wald statistics of the parameters; see e.g. here for more info.)
  • a Gamma model (probably with a log link)
  • if the data started out as count data that you've scaled by some measure of effort or exposure), use an appropriate offset model ...
  • a generalized least-squares model (nlme::gls) with an appropriate heteroscedasticity specification

Upvotes: 6

Related Questions