Denis  Korzhenkov
Denis Korzhenkov

Reputation: 383

scipy.integrate.quad precision on big numbers

I try to compute such an integral (actually cdf of exponential distribution with its pdf) via scipy.integrate.quad():

import numpy as np
from scipy.integrate import quad

def g(x):
    return .5 * np.exp(-.5 * x)

print quad(g, a=0., b=np.inf)
print quad(g, a=0., b=10**6)
print quad(g, a=0., b=10**5)
print quad(g, a=0., b=10**4)

And the result is as follows:

(1.0, 3.5807346295637055e-11)
(0.0, 0.0)
(3.881683817604194e-22, 7.717972744764185e-22)
(1.0, 1.6059202674761255e-14)

All the attempts to use a big upper integration limit yield an incorrect answer though the usage of np.inf solves the problem.

Similiar case is discussed in scipy issue #5428 at GitHub.

What should I do to avoid such an error in integrating other density functions?

Upvotes: 4

Views: 6950

Answers (2)

pv.
pv.

Reputation: 35135

Use the points argument to tell the algorithm where the support of your function roughly is:

import numpy as np
from scipy.integrate import quad

def g(x):
    return .5 * np.exp(-.5 * x)

print quad(g, a=0., b=10**3, points=[1, 100])
print quad(g, a=0., b=10**6, points=[1, 100])
print quad(g, a=0., b=10**9, points=[1, 100])
print quad(g, a=0., b=10**12, points=[1, 100])

Upvotes: 3

Stelios
Stelios

Reputation: 5521

I believe the issue is due to np.exp(-x) quickly becoming very small as x increases, which results in evaluating as zero due to limited numerical precision. For example, even for x as small as x=10**2*, np.exp(-x) evaluates to 3.72007597602e-44, whereas x values of order 10**3 or above result in 0.

I do not know the implementation specifics of quad, but it probably performs some kind of sampling of the function to be integrated over the given integration range. For a large upper integration limit, most of the samples of np.exp(-x) evaluate to zero, hence the integral value is underestimated. (Note that in these cases the provided absolute error by quad is of the same order as the integral value which is an indicator that the latter is unreliable.)

One approach to avoid this issue is to restrict the integration upper bound to a value above which the numerical function becomes very small (and, hence, contributes marginally to the integral value). From your code snipet, the value of 10**4 appears to be a good choice, however, a value of 10**2 also results in an accurate evaluation of the integral.

Another approach to avoid numerical precision issues is to use a module that performs computation in arbitrary precision arithmetic, such as mpmath. For example, for x=10**5, mpmath evaluates exp(-x) as follows (using the native mpmath exponential function)

import mpmath as mp
print(mp.exp(-10**5))

3.56294956530937e-43430

Note how small this value is. With the standard hardware numerical precision (used by numpy) this value becomes 0.

mpmath offers an integration function (mp.quad), which can provide an accurate estimate of the integral for arbitrary values of the upper integral bound.

import mpmath as mp

print(mp.quad(lambda x : .5 * mp.exp(-.5 * x), [0, mp.inf]))
print(mp.quad(lambda x : .5 * mp.exp(-.5 * x), [0, 10**13]))
print(mp.quad(lambda x : .5 * mp.exp(-.5 * x), [0, 10**8]))
print(mp.quad(lambda x : .5 * mp.exp(-.5 * x), [0, 10**5]))
1.0
0.999999650469474
0.999999999996516
0.999999999999997

We can also obtain even more accurate estimates by increasing the precision to, say, 50 decimal points (from 15 which is the standard precision)

mp.mp.dps = 50; 

print(mp.quad(lambda x : .5 * mp.exp(-.5 * x), [0, mp.inf]))
print(mp.quad(lambda x : .5 * mp.exp(-.5 * x), [0, 10**13]))
print(mp.quad(lambda x : .5 * mp.exp(-.5 * x), [0, 10**8]))
print(mp.quad(lambda x : .5 * mp.exp(-.5 * x), [0, 10**5]))
1.0
0.99999999999999999999999999999999999999999829880262
0.99999999999999999999999999999999999999999999997463
0.99999999999999999999999999999999999999999999999998

In general, the cost for obtaining this accuracy is an increased computation time.

P.S.: It goes without saying that if you are able to evaluate your integral analytically in the first place (e.g., with the help of Sympy) you can forget all the above.

Upvotes: 6

Related Questions