Reputation: 81
I have a non-lenear optimization problem with a constraint and upper/lower bounds, so with scipy I have to use SLSQP. The problem is clearly not convex. I got the jacobian fo both the objective and constraint functions to work correctly (results are good/fast up to 300 input vector). All functions are vectorized and tuned to run very fast. The problem is that using 1000+ input vector takes ages though I can see the minimizer is not calling my functions a lot (objective/constraint/gradients) and seems to spend most of its processing time internally. I read somewhere perf of SLSQP is O(n^3).
Is there a better/faster SLSQP implementation or another method for this type of problem for python ? I tried nlopt and somehow returns wrong results given the exact same functions I use in scipy (with a wrapper to adapt to its method signature). I also failed to use ipopt with pyipopt package, cannot get working ipopt binaries to work with the python wrapper.
UPDATE: if it helps, my input variable is basically a vector of (x,y) tuples or points in 2D surface representing coordinates. With 1000 points, I end up with a 2000 dim input vector. The function I want to optimize calculates optimum position of the points between each other taking into consideration their relationships and other constraints. So the problem is not sparse.
Thanks...
Upvotes: 3
Views: 13063
Reputation: 336
In my opinion scipy.minimze provides an intuitive interface for optimization. I've found that speeding up the cost function (and eventually the gradient function) can give you nice speedups.
For example take the N-Dimensional Rosenbrock function:
import numpy as np
from scipy.optimize import minimize
def rosenbrock(x, N):
out = 0.0
for i in range(N-1):
out += 100.0 * (x[i+1] - x[i]**2)**2 + (1 - x[i])**2
return out
# slow optimize
N = 20
x_0 = - np.ones(N)
%timeit minimize(rosenbrock, x_0, args=(N,), method='SLSQP', options={'maxiter': 1e4})
res = minimize(rosenbrock, x_0, args=(N,), method='SLSQP', options={'maxiter': 1e4})
print(res.message)
Timing of the optimization yields
102 ms ± 1.86 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Optimization terminated successfully.
Now you could speed up the objective function using numba, and provide a naive function to compute the gradient like this:
from numba import jit, float64, int64
@jit(float64(float64[:], int64), nopython=True, parallel=True)
def fast_rosenbrock(x, N):
out = 0.0
for i in range(N-1):
out += 100.0 * (x[i+1] - x[i]**2)**2 + (1 - x[i])**2
return out
@jit(float64[:](float64[:], int64), nopython=True, parallel=True)
def fast_jac(x, N):
h = 1e-9
jac = np.zeros_like(x)
f_0 = fast_rosenbrock(x, N)
for i in range(N):
x_d = np.copy(x)
x_d[i] += h
f_d = fast_rosenbrock(x_d, N)
jac[i] = (f_d - f_0) / h
return jac
Which is basically just adding a decorator to the objective function, allowing parallel computation. Now we can time the optimization again:
print('with fast jacobian')
%timeit minimize(fast_rosenbrock, x_0, args=(N,), method='SLSQP', options={'maxiter': 1e4}, jac=fast_jac)
print('without fast jacobian')
%timeit minimize(fast_rosenbrock, x_0, args=(N,), method='SLSQP', options={'maxiter': 1e4})
res = minimize(fast_rosenbrock, x_0, args=(N,), method='SLSQP', options={'maxiter': 1e4}, jac=fast_jac)
print(res.message)
Trying both, with and without providing the fast jacobian function. The output of this is:
with fast jacobian
9.67 ms ± 488 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
without fast jacobian
27.2 ms ± 2.4 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Optimization terminated successfully.
Which is an about 10 times speedup for little effort. The improvement you can achieve with this is highly dependent on the inefficiency of your cost function. I had a cost function with several calculations in it and was able to get speedups around 10^2 - 10^3.
The advantage of this approach is its little effort and that you can stay with scipy and its nice interface.
Upvotes: 7
Reputation: 81
Surprisingly, I found a relatively ok solution using an optimizer from for a deep learning framework, Tensorflow, using basic gradient descent (actually RMSProp, gradient descent with momentum) after I changed the cost function to include the inequality constraint and the bounding constraints as penalties (I suppose this is same as lagrange method). It trains super fast and converges quickly with proper lambda parameters on the constraint penalties. I didn't even have to rewrite the jacobians as TF takes care of that without much speed impact apparently.
Before that, I managed to get NLOPT to work and it is much faster than scipy/SLSQP but still slow on higher dimensions. Also NLOPT/AUGLANG is super fast but converges poorly.
This said, at 20k variables, it is stillslow. Partly due to memory swapping and the cost function being at least O(n^2) from the pair-wise euclidean distance (I use (x-x.t)^2+(y-y.t)^2 with broadcasting). So still not optimal.
Upvotes: 2
Reputation: 16724
We don't know much about the model, but here are some notes:
Upvotes: 7