Thibaultofc
Thibaultofc

Reputation: 31

Why is SymPy's integrate function much slower when doing a definite integral than an approximation?

Consider f = lambda x : 1/x and I want to get its definite integral between 2 and 7.

The first method is using a linspace and evaluating a Riemann Sum over 10^4 terms.

l = list(np.linspace(2,7,10**4))

s = 0

for i in l:
    s+=f(i)*(l[1]-l[0])

The second method is using SymPy's integrate function and evaluating it.

x = sp.symbols('x')
t = sp.integrate(f(x),(x,2,7)).evalf()

The output gives us :

Riemann Sum : 1.2529237036377492
--- 13.025045394897461 milliseconds ---


SymPy : 1.25276296849537
--- 71.07734680175781 milliseconds ---


Delta : 0.0128304512843464 %

My question is: Why is sympyaround 4 to 5 times slower than a Riemann Sum for a delta <.1% and is there any way to improve any of the two methods ?

Upvotes: 0

Views: 302

Answers (1)

hpaulj
hpaulj

Reputation: 231605

sympy is a symbolic/algebraic package, manipulating complex "symbol/expression" objects.

In an isympy session:

In [7]: f = lambda x : 1/x

In [8]: integrate(f(x),(x,2,7)).evalf()
Out[8]: 1.25276296849537

In [9]: integrate(f(x),(x,2,7))
Out[9]: -log(2) + log(7)

In [10]: timeit integrate(f(x),(x,2,7))
10.6 ms ± 26.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [11]: timeit integrate(f(x),(x,2,7)).evalf()
10.8 ms ± 13.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

The bulk of the time is spent in the symbolic part, with the final numeric evaluation being relatively fast.

Your iterative numeric solution:

In [45]: f = lambda x : 1/x
In [46]: %%timeit
    ...: s = 0
    ...: for i in l:
    ...:     s+=f(i)*(l[1]-l[0])
    ...: 
5.91 ms ± 157 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

But using numpy we can do that a lot faster:

In [47]: (f(np.array(l))*(l[1]-l[0])).sum()
Out[47]: 1.2529237036377558
In [48]: timeit (f(np.array(l))*(l[1]-l[0])).sum()
631 µs ± 275 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

and even better if the input is an array already (your linspace without the `tolist()):

In [49]: %%timeit larr=np.array(l)
    ...: (f(larr)*(l[1]-l[0])).sum()
61.2 µs ± 735 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

scipy has a bunch of integration functions, most of which use compiled libraries like QUADPACK. A basic one is quad:

In [50]: from scipy.integrate import quad
In [52]: quad(f,2,7)
Out[52]: (1.2527629684953678, 3.2979213205748694e-12)
In [53]: timeit quad(f,2,7)
7.22 µs ± 57.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

According the full_output display quad only has to call f() 21 times, rather than the 10**4 your iteration does.

Upvotes: 1

Related Questions