Reputation: 49
I am currently optimizing the runtime of my code, and it is still not within the time consumption bounds I would like it to be. I have gotten to the point where 80% of the time is spent on running lambdify() on my sympy Matrix expressions and evaluating the resulting lambda functions when performing gauss quadrature. All other aspects of the code are sufficiently optimized, I was therefore hoping someone could help me optimizing the substantial "bottleneck" in my code of lambdifying and evaluating sympy expressions.
The code is written on a 64-bit Windows 7 machine with Python 3.5.2 (examples below, illustrating the code, are performed on Jupyter QtConsole) and the following module versions:
The reason for lambdify() using a lot of time I believe is the complexity of the sympy expression (which involves multiplications of sympy Piecewise() expressions). Simplification of these expressions are not possible, as they are wavelet functions created from Legendre scaling functions using the standard Alpert algorithm. A smaller example of such a Matrix and time comparison with lambdifying a "simpler" Matrix is given here:
from sympy import *
import numpy as np
import timeit
xi1 = symbols('xi1')
xi2 = symbols('xi2')
M = Matrix([[-0.0015625*(3.46410161513775*(0.00624999999999998*xi2 -
0.99375)*Piecewise((-1, 0.00624999999999998*xi2 - 0.99375 >= 0),
(1, 0.00624999999999998*xi2 - 0.99375 < 0)) +
1.73205080756888)*Piecewise((1, And(0.00624999999999998*xi2 -
0.99375 <= 1, 0.00624999999999998*xi2 -
0.99375 >= -1)), (0, True))],
[-0.00156249999999999*(0.0187499999999999*xi2 + 2.0*Piecewise((-1,
0.00624999999999998*xi2 - 0.99375 >= 0), (1,
0.00624999999999998*xi2 - 0.99375 < 0)) - 2.98125)*Piecewise((1,
And(0.00624999999999998*xi2 - 0.99375 <= 1,
0.00624999999999998*xi2 - 0.99375 >= -1)), (0, True))],
[-0.00270632938682636*xi1*(3.46410161513775*
(0.00624999999999998*xi2 - 0.99375)*Piecewise((-1,
0.00624999999999998*xi2 - 0.99375 >= 0), (1,
0.00624999999999998*xi2 - 0.99375 < 0)) +
1.73205080756888)*Piecewise((1, And(0.00624999999999998*xi2 -
0.99375 <= 1, 0.00624999999999998*xi2 - 0.99375 >= -1)), (0,
True))]])
M_simpl = Matrix([(xi2**2),(xi2**2)*xi1,(xi2**2)*(xi1**2)])
Time comparison yields:
import timeit
%timeit lambdify([xi1,xi2], M, 'numpy')
10 loops, best of 3: 23 ms per loop
%timeit lambdify([xi1,xi2], M_simpl, 'numpy')
100 loops, best of 3: 2.47 ms per loop
This shows that the more complex expressions are handled almost 10x slower than the simpler Matrix, which makes a significant contribution to the runtime when lambdify() is applied to several of these types of matrices. Researching the subject I have learned of the faster ufuncify() function in sympy.utilities.autowrap, which seems to work best using a Fortran or C++ backend. However, this is not the best alternative in my case, as the function does not yet extend to sympy Matrices and I would like the code to be general enough s.t. other Windows users adapting the code does not need to install C++ compiler etc. So, is there anyway of achieving a speed up of the lambdify() function for these types of sympy expressions without using other compilers?
The lambdifyed functions of the sympy Matrices above also performs different when it comes to evaluation at specific coordinates. This is illustrated with the following simple 5-point quadrature example:
# Quadrature coordinates
xi_v = np.array([[-1,-1], [-0.5,-0.5], [0,0], [0.5,0.5], [1,1]])
# Quadrature weights
w = np.array([3, 2, 1, 2, 3])
# Quadrature
def quad_func(func, xi_v, w):
G = np.zeros((3, 1))
for i in range(0, len(w), 1):
G += w[i]*func(*xi_v[i,:])
return G
# Testing time usage
f = lambdify([xi1,xi2], M, 'numpy')
%timeit quad_func(f, xi_v, w)
1000 loops, best of 3: 852 µs per loop
f_simpl = lambdify([xi1,xi2], M_simpl, 'numpy')
%timeit quad_func(f_simpl, xi_v, w)
10000 loops, best of 3: 33.9 µs per loop
My first instinct was to introduce jit from the numba module in order to speed up the evaluation. However, this resulted in a pop-up window stating that python has stopped working, and the kernel is restarted (Happens for both f and f_simpl):
import numba
quad_func_jit = numba.jit(quad_func)
quad_func_jit(f, xi_v, w)
Kernel died, restarting
So again, is there anyway to speed up these lambda function evaluations in order to reduce the total runtime? Or possibly some way of avoiding the crash for numba.jit?
Upvotes: 4
Views: 1217
Reputation: 312
I was interested in the code that lambdify
produces (lambdify converts sympy syntax into numpy code), so using the inspect
module I printed it:
f = lambdify([xi1,xi2], M, 'numpy')
import inspect
lines = inspect.getsource(f)
print(lines)
(The code for f
can be taken from the question, I will not repeat it here for the sake of brevity) The print statement gave me this massive function, which seems to be correct:
def _lambdifygenerated(xi1, xi2):
return array(
[[(-0.0015625*(0.0216506350946109*xi2 - 3.44245098004314)
*select([greater_equal(0.00624999999999998*xi2 - 0.99375, 0),True],
[-1,1], default=nan) - 0.00270632938682638)
*select([logical_and.reduce((greater_equal(0.00624999999999998*xi2
- 0.99375, -1),
less_equal(0.00624999999999998*xi2
- 0.99375, 1))),True], [1,0],
default=nan)], [
(-2.92968749999997e-5*xi2 - 0.00312499999999998
*select([
greater_equal(0.00624999999999998*xi2 - 0.99375, 0),True], [-1,1],
default=nan) + 0.00465820312499997)
*select([logical_and.
reduce((greater_equal(0.00624999999999998*xi2 - 0.99375, -1),
less_equal(0.00624999999999998*xi2 - 0.99375, 1))),
True], [1,0], default=nan)],
[-0.00270632938682636*xi1*((0.0216506350946109*xi2
- 3.44245098004314)*
select([greater_equal(0.00624999999999998*
xi2 - 0.99375, 0),True], [-1,1], default=nan)
+ 1.73205080756888)*select([logical_and.reduce((greater_equal(
0.00624999999999998*xi2 - 0.99375, -1),
less_equal(0.00624999999999998*xi2 - 0.99375, 1))),
True], [1,0], default=nan)]])
However, this function uses a lot of non-numba supported numpy functions, such as select
. Which makes it impossible to use numba. So to answer your question: No, it is (sadly) not possible to combine lambdify and numba to create JIT compiled sympy answers
Upvotes: 1