Riad Souissi
Riad Souissi

Reputation: 81

scipy.optimize.minimize('SLSQP') too slow when given 2000 dim variable

I have a non-lenear optimization problem with a constraint and upper/lower bounds, so with scipy I have to use SLSQP. The problem is clearly not convex. I got the jacobian fo both the objective and constraint functions to work correctly (results are good/fast up to 300 input vector). All functions are vectorized and tuned to run very fast. The problem is that using 1000+ input vector takes ages though I can see the minimizer is not calling my functions a lot (objective/constraint/gradients) and seems to spend most of its processing time internally. I read somewhere perf of SLSQP is O(n^3).

Is there a better/faster SLSQP implementation or another method for this type of problem for python ? I tried nlopt and somehow returns wrong results given the exact same functions I use in scipy (with a wrapper to adapt to its method signature). I also failed to use ipopt with pyipopt package, cannot get working ipopt binaries to work with the python wrapper.

UPDATE: if it helps, my input variable is basically a vector of (x,y) tuples or points in 2D surface representing coordinates. With 1000 points, I end up with a 2000 dim input vector. The function I want to optimize calculates optimum position of the points between each other taking into consideration their relationships and other constraints. So the problem is not sparse.

Thanks...

Upvotes: 3

Views: 13063

Answers (3)

ccssmnn
ccssmnn

Reputation: 336

In my opinion scipy.minimze provides an intuitive interface for optimization. I've found that speeding up the cost function (and eventually the gradient function) can give you nice speedups.

For example take the N-Dimensional Rosenbrock function:

import numpy as np
from scipy.optimize import minimize


def rosenbrock(x, N):
    out = 0.0
    for i in range(N-1):
        out += 100.0 * (x[i+1] - x[i]**2)**2 + (1 - x[i])**2
    return out

# slow optimize
N = 20
x_0 = - np.ones(N)
%timeit minimize(rosenbrock, x_0, args=(N,), method='SLSQP', options={'maxiter': 1e4})
res = minimize(rosenbrock, x_0, args=(N,), method='SLSQP', options={'maxiter': 1e4})
print(res.message)

Timing of the optimization yields

102 ms ± 1.86 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Optimization terminated successfully.

Now you could speed up the objective function using numba, and provide a naive function to compute the gradient like this:

from numba import jit, float64, int64

@jit(float64(float64[:], int64), nopython=True, parallel=True)
def fast_rosenbrock(x, N):
    out = 0.0
    for i in range(N-1):
        out += 100.0 * (x[i+1] - x[i]**2)**2 + (1 - x[i])**2
    return out


@jit(float64[:](float64[:], int64), nopython=True, parallel=True)
def fast_jac(x, N):
    h = 1e-9
    jac = np.zeros_like(x)
    f_0 = fast_rosenbrock(x, N)
    for i in range(N):
        x_d = np.copy(x)
        x_d[i] += h
        f_d = fast_rosenbrock(x_d, N)
        jac[i] = (f_d - f_0) / h
    return jac

Which is basically just adding a decorator to the objective function, allowing parallel computation. Now we can time the optimization again:

print('with fast jacobian')
%timeit minimize(fast_rosenbrock, x_0, args=(N,), method='SLSQP', options={'maxiter': 1e4}, jac=fast_jac)
print('without fast jacobian')
%timeit minimize(fast_rosenbrock, x_0, args=(N,), method='SLSQP', options={'maxiter': 1e4})
res = minimize(fast_rosenbrock, x_0, args=(N,), method='SLSQP', options={'maxiter': 1e4}, jac=fast_jac)
print(res.message)

Trying both, with and without providing the fast jacobian function. The output of this is:

with fast jacobian
9.67 ms ± 488 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
without fast jacobian
27.2 ms ± 2.4 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Optimization terminated successfully.

Which is an about 10 times speedup for little effort. The improvement you can achieve with this is highly dependent on the inefficiency of your cost function. I had a cost function with several calculations in it and was able to get speedups around 10^2 - 10^3.

The advantage of this approach is its little effort and that you can stay with scipy and its nice interface.

Upvotes: 7

Riad Souissi
Riad Souissi

Reputation: 81

Surprisingly, I found a relatively ok solution using an optimizer from for a deep learning framework, Tensorflow, using basic gradient descent (actually RMSProp, gradient descent with momentum) after I changed the cost function to include the inequality constraint and the bounding constraints as penalties (I suppose this is same as lagrange method). It trains super fast and converges quickly with proper lambda parameters on the constraint penalties. I didn't even have to rewrite the jacobians as TF takes care of that without much speed impact apparently.

Before that, I managed to get NLOPT to work and it is much faster than scipy/SLSQP but still slow on higher dimensions. Also NLOPT/AUGLANG is super fast but converges poorly.

This said, at 20k variables, it is stillslow. Partly due to memory swapping and the cost function being at least O(n^2) from the pair-wise euclidean distance (I use (x-x.t)^2+(y-y.t)^2 with broadcasting). So still not optimal.

Upvotes: 2

Erwin Kalvelagen
Erwin Kalvelagen

Reputation: 16724

We don't know much about the model, but here are some notes:

  1. SLSQP is really designed for small (dense), well-scaled models.
  2. SLSQP is a local solver. It will accept non-convex problems but will only provide local solutions.
  3. I doubt there is this kind of a complexity bound for SLSQP. Anyway, it does not say much about the performance of a particular problem.
  4. IPOPT is a large-scale, sparse interior point solver. It will find local solutions. It can solve really large models.
  5. Global solvers like BARON, ANTIGONE and COUENNE find global solutions (or bounds on the quality of the solution if you are impatient and stop prematurely). These solvers are (most of the time) slower than local solvers. I am not aware of direct Python links.
  6. If you have a good starting point, a local solver may be just what you need. Using a multistart strategy can help finding better solutions (not proven globally optimal but you get some confidence you have not found a really bad local optimum).
  7. The Python framework PYOMO offers access to many solvers. However you will need to rewrite the model. PYOMO has automatic differentiation: no need to provide gradients.
  8. For testing you can try to transcribe the model in AMPL or GAMS and solve online via NEOS. This will allow you to try out a number of solvers. Both AMPL and GAMS have automatic differentation.

Upvotes: 7

Related Questions