Reputation: 8370
I'm trying to put Poisson continuous error bars on a histogram I'm making with matplotlib, but I can't seem to find a numpy function that will given me a 95% confidence interval assuming poissonian data. Ideally the solution doesn't depend on scipy, but anything will work. Does such a function exist? I've found a lot about bootstrapping but this seems a bit excessive in my case.
Upvotes: 12
Views: 9723
Reputation: 109
This problem comes up a lot in astronomy (my field!) and this paper is the go-to reference for these confidence intervals: Gehrels 1980
It has a lot of math in it for an arbitrary confidence interval with Poisson statistics, but for a two-sided 95% confidence interval (corresponding to a 2-sigma Gaussian confidence interval, or S=2 in the context of this paper) some simple analytic formulae for the upper and lower confidence limits for when N events are measured are
upper = N + 2. * np.sqrt(N + 1) + 4. / 3.
lower = N * (1. - 1. / (9. * N) - 2. / (3. * np.sqrt(N))) ** 3.
where I have put them in Python format for you already. All you need is numpy or your other favorite square root module. Keep in mind that these will give you the upper and lower limits for the events - not the +/- values. You just subtract off N from both of these to get those.
Please consult the paper for the accuracy of these formulae for the confidence interval you need, but these should be more than sufficiently accurate for most practical applications.
Upvotes: 2
Reputation: 67427
Using scipy.stats.poisson
, and the interval
method:
>>> scipy.stats.poisson.interval(0.95, [10, 20, 30])
(array([ 4., 12., 20.]), array([ 17., 29., 41.]))
Even though it only makes limited sense to compute the Poisson distribution for non-integer values, the exact confidence intervals requested by the OP can be computed it can be done as follows:
>>> data = np.array([10, 20, 30])
>>> scipy.stats.poisson.interval(0.95, data)
(array([ 4., 12., 20.]), array([ 17., 29., 41.]))
>>> np.array(scipy.stats.chi2.interval(.95, 2 * data)) / 2 - 1
array([[ 3.7953887 , 11.21651959, 19.24087402],
[ 16.08480345, 28.67085357, 40.64883744]])
It's also possible to use the ppf
method:
>>> data = np.array([10, 20, 30])
>>> scipy.stats.poisson.ppf([0.025, 0.975], data[:, None])
array([[ 4., 17.],
[ 12., 29.],
[ 20., 41.]])
But because the distribution is discrete the return values will be integers, and the confidence interval will not span 95% exactly:
>>> scipy.stats.poisson.ppf([0.025, 0.975], 10)
array([ 4., 17.])
>>> scipy.stats.poisson.cdf([4, 17], 10)
array([ 0.02925269, 0.98572239])
Upvotes: 9
Reputation: 8370
I ended up writing my own function based on some properties I found on Wikipedia.
def poisson_interval(k, alpha=0.05):
"""
uses chisquared info to get the poisson interval. Uses scipy.stats
(imports in function).
"""
from scipy.stats import chi2
a = alpha
low, high = (chi2.ppf(a/2, 2*k) / 2, chi2.ppf(1-a/2, 2*k + 2) / 2)
if k == 0:
low = 0.0
return low, high
This returns continuous (rather than discrete) bounds, which is more standard in my field.
Upvotes: 14