Shep
Shep

Reputation: 8370

Poisson confidence interval with numpy

I'm trying to put Poisson continuous error bars on a histogram I'm making with matplotlib, but I can't seem to find a numpy function that will given me a 95% confidence interval assuming poissonian data. Ideally the solution doesn't depend on scipy, but anything will work. Does such a function exist? I've found a lot about bootstrapping but this seems a bit excessive in my case.

Upvotes: 12

Views: 9723

Answers (3)

Steven Ehlert
Steven Ehlert

Reputation: 109

This problem comes up a lot in astronomy (my field!) and this paper is the go-to reference for these confidence intervals: Gehrels 1980

It has a lot of math in it for an arbitrary confidence interval with Poisson statistics, but for a two-sided 95% confidence interval (corresponding to a 2-sigma Gaussian confidence interval, or S=2 in the context of this paper) some simple analytic formulae for the upper and lower confidence limits for when N events are measured are

upper = N + 2. * np.sqrt(N + 1) + 4. / 3.
lower = N * (1. - 1. / (9. * N) - 2. / (3. * np.sqrt(N))) ** 3.

where I have put them in Python format for you already. All you need is numpy or your other favorite square root module. Keep in mind that these will give you the upper and lower limits for the events - not the +/- values. You just subtract off N from both of these to get those.

Please consult the paper for the accuracy of these formulae for the confidence interval you need, but these should be more than sufficiently accurate for most practical applications.

Upvotes: 2

Jaime
Jaime

Reputation: 67427

Using scipy.stats.poisson, and the interval method:

>>> scipy.stats.poisson.interval(0.95, [10, 20, 30])
(array([  4.,  12.,  20.]), array([ 17.,  29.,  41.]))

Even though it only makes limited sense to compute the Poisson distribution for non-integer values, the exact confidence intervals requested by the OP can be computed it can be done as follows:

>>> data = np.array([10, 20, 30])
>>> scipy.stats.poisson.interval(0.95, data)
(array([  4.,  12.,  20.]), array([ 17.,  29.,  41.]))
>>> np.array(scipy.stats.chi2.interval(.95, 2 * data)) / 2 - 1
array([[  3.7953887 ,  11.21651959,  19.24087402],
       [ 16.08480345,  28.67085357,  40.64883744]])

It's also possible to use the ppf method:

>>> data = np.array([10, 20, 30])
>>> scipy.stats.poisson.ppf([0.025, 0.975], data[:, None])
array([[  4.,  17.],
       [ 12.,  29.],
       [ 20.,  41.]])

But because the distribution is discrete the return values will be integers, and the confidence interval will not span 95% exactly:

>>> scipy.stats.poisson.ppf([0.025, 0.975], 10)
array([  4.,  17.])
>>> scipy.stats.poisson.cdf([4, 17], 10)
array([ 0.02925269,  0.98572239])

Upvotes: 9

Shep
Shep

Reputation: 8370

I ended up writing my own function based on some properties I found on Wikipedia.

def poisson_interval(k, alpha=0.05): 
    """
    uses chisquared info to get the poisson interval. Uses scipy.stats 
    (imports in function). 
    """
    from scipy.stats import chi2
    a = alpha
    low, high = (chi2.ppf(a/2, 2*k) / 2, chi2.ppf(1-a/2, 2*k + 2) / 2)
    if k == 0: 
        low = 0.0
    return low, high

This returns continuous (rather than discrete) bounds, which is more standard in my field.

Upvotes: 14

Related Questions