Optimizing runtime with lambdify and function evaluation

Question

I am currently optimizing the runtime of my code, and it is still not within the time consumption bounds I would like it to be. I have gotten to the point where 80% of the time is spent on running lambdify() on my sympy Matrix expressions and evaluating the resulting lambda functions when performing gauss quadrature. All other aspects of the code are sufficiently optimized, I was therefore hoping someone could help me optimizing the substantial "bottleneck" in my code of lambdifying and evaluating sympy expressions.

The code is written on a 64-bit Windows 7 machine with Python 3.5.2 (examples below, illustrating the code, are performed on Jupyter QtConsole) and the following module versions:

Sympy: 1.0
Numpy: 1.11.1
Numba: 0.27

Lambdify()

The reason for lambdify() using a lot of time I believe is the complexity of the sympy expression (which involves multiplications of sympy Piecewise() expressions). Simplification of these expressions are not possible, as they are wavelet functions created from Legendre scaling functions using the standard Alpert algorithm. A smaller example of such a Matrix and time comparison with lambdifying a "simpler" Matrix is given here:

from sympy import *
import numpy as np
import timeit

xi1 = symbols('xi1')
xi2 = symbols('xi2')
M = Matrix([[-0.0015625*(3.46410161513775*(0.00624999999999998*xi2 - 
           0.99375)*Piecewise((-1, 0.00624999999999998*xi2 - 0.99375 >= 0), 
           (1, 0.00624999999999998*xi2 - 0.99375 < 0)) + 
           1.73205080756888)*Piecewise((1, And(0.00624999999999998*xi2 - 
           0.99375 <= 1, 0.00624999999999998*xi2 - 
           0.99375 >= -1)), (0, True))], 
          [-0.00156249999999999*(0.0187499999999999*xi2 + 2.0*Piecewise((-1, 
           0.00624999999999998*xi2 - 0.99375 >= 0), (1, 
           0.00624999999999998*xi2 - 0.99375 < 0)) - 2.98125)*Piecewise((1, 
           And(0.00624999999999998*xi2 - 0.99375 <= 1, 
           0.00624999999999998*xi2 - 0.99375 >= -1)), (0, True))], 
          [-0.00270632938682636*xi1*(3.46410161513775*
           (0.00624999999999998*xi2 - 0.99375)*Piecewise((-1, 
           0.00624999999999998*xi2 - 0.99375 >= 0), (1, 
           0.00624999999999998*xi2 - 0.99375 < 0)) + 
           1.73205080756888)*Piecewise((1, And(0.00624999999999998*xi2 - 
           0.99375 <= 1, 0.00624999999999998*xi2 - 0.99375 >= -1)), (0, 
           True))]])
M_simpl = Matrix([(xi2**2),(xi2**2)*xi1,(xi2**2)*(xi1**2)])

Time comparison yields:

import timeit

%timeit lambdify([xi1,xi2], M, 'numpy')
10 loops, best of 3: 23 ms per loop
%timeit lambdify([xi1,xi2], M_simpl, 'numpy')
100 loops, best of 3: 2.47 ms per loop

This shows that the more complex expressions are handled almost 10x slower than the simpler Matrix, which makes a significant contribution to the runtime when lambdify() is applied to several of these types of matrices. Researching the subject I have learned of the faster ufuncify() function in sympy.utilities.autowrap, which seems to work best using a Fortran or C++ backend. However, this is not the best alternative in my case, as the function does not yet extend to sympy Matrices and I would like the code to be general enough s.t. other Windows users adapting the code does not need to install C++ compiler etc. So, is there anyway of achieving a speed up of the lambdify() function for these types of sympy expressions without using other compilers?

Lambda function evaluation

The lambdifyed functions of the sympy Matrices above also performs different when it comes to evaluation at specific coordinates. This is illustrated with the following simple 5-point quadrature example:

# Quadrature coordinates
xi_v = np.array([[-1,-1], [-0.5,-0.5], [0,0], [0.5,0.5], [1,1]])
# Quadrature weights
w = np.array([3, 2, 1, 2, 3])

# Quadrature
def quad_func(func, xi_v, w):
    G = np.zeros((3, 1))
    for i in range(0, len(w), 1):
        G += w[i]*func(*xi_v[i,:])
    return G

# Testing time usage
f = lambdify([xi1,xi2], M, 'numpy')
%timeit quad_func(f, xi_v, w)
1000 loops, best of 3: 852 µs per loop
f_simpl = lambdify([xi1,xi2], M_simpl, 'numpy')
%timeit quad_func(f_simpl, xi_v, w)
10000 loops, best of 3: 33.9 µs per loop

My first instinct was to introduce jit from the numba module in order to speed up the evaluation. However, this resulted in a pop-up window stating that python has stopped working, and the kernel is restarted (Happens for both f and f_simpl):

import numba

quad_func_jit = numba.jit(quad_func)
quad_func_jit(f, xi_v, w)

Kernel died, restarting

So again, is there anyway to speed up these lambda function evaluations in order to reduce the total runtime? Or possibly some way of avoiding the crash for numba.jit?

Optimizing runtime with lambdify and function evaluation

Lambdify()

Lambda function evaluation

Answers (1)

Related Questions