Reputation: 1132
I have a least squares minimization problem subject to inequality constraints which I am trying to solve using scipy.optimize.minimize. It seems that there are two options for inequality constraints: COBYLA and SLSQP.
I first tried SLSQP since it allow for explicit partial derivatives of the function to be minimized. Depending on the scaling of the problem, it fails with error:
Positive directional derivative for linesearch (Exit mode 8)
whenever interval or more general inequality constraints are imposed.
This has been observed previously e.g., here. Manual scaling of the function to be minimized (along with the associated partial derivatives) seems to get rid of the problem, but I cannot achieve the same effect by changing ftol in the options.
Overall, this whole thing is causing me to have doubts about the routine working in a robust manner. Here's a simplified example:
import numpy as np
import scipy.optimize as sp_optimize
def cost(x, A, y):
e = y - A.dot(x)
rss = np.sum(e ** 2)
return rss
def cost_deriv(x, A, y):
e = y - A.dot(x)
deriv0 = -2 * e.dot(A[:,0])
deriv1 = -2 * e.dot(A[:,1])
deriv = np.array([deriv0, deriv1])
return deriv
A = np.ones((10,2)); A[:,0] = np.linspace(-5,5, 10)
x_true = np.array([2, 2/20])
y = A.dot(x_true)
x_guess = x_true / 2
prm_bounds = ((0, 3), (0,1))
cons_SLSQP = ({'type': 'ineq', 'fun' : lambda x: np.array([x[0] - x[1]]),
'jac' : lambda x: np.array([1.0, -1.0])})
# works correctly
min_res_SLSQP = sp_optimize.minimize(cost, x_guess, args=(A, y), jac=cost_deriv, bounds=prm_bounds, method='SLSQP', constraints=cons_SLSQP, options={'disp': True})
print(min_res_SLSQP)
# fails
A = 100 * A
y = A.dot(x_true)
min_res_SLSQP = sp_optimize.minimize(cost, x_guess, args=(A, y), jac=cost_deriv, bounds=prm_bounds, method='SLSQP', constraints=cons_SLSQP, options={'disp': True})
print(min_res_SLSQP)
# works if bounds and inequality constraints removed
min_res_SLSQP = sp_optimize.minimize(cost, x_guess, args=(A, y), jac=cost_deriv,
method='SLSQP', options={'disp': True})
print(min_res_SLSQP)
How should ftol be set to avoid failure? More generally, can a similar problem arise with COBYLA? Is COBYLA a better choice for this type of inequality constrained least squares optimization problem?
Using a square root in the cost function was found to improve performance. However, for a non-linear re-paramterization of the problem (simpler but closer to what I need to do in practice), it fails again. Here are the details:
import numpy as np
import scipy.optimize as sp_optimize
def cost(x, y, g):
e = ((y - x[1]) / x[0]) - g
rss = np.sqrt(np.sum(e ** 2))
return rss
def cost_deriv(x, y, g):
e = ((y- x[1]) / x[0]) - g
factor = 0.5 / np.sqrt(e.dot(e))
deriv0 = -2 * factor * e.dot(y - x[1]) / (x[0]**2)
deriv1 = -2 * factor * np.sum(e) / x[0]
deriv = np.array([deriv0, deriv1])
return deriv
x_true = np.array([1/300, .1])
N = 20
t = 20 * np.arange(N)
g = 100 * np.cos(2 * np.pi * 1e-3 * (t - t[-1] / 2))
y = g * x_true[0] + x_true[1]
x_guess = x_true / 2
prm_bounds = ((1e-4, 1e-2), (0, .4))
# check derivatives
delta = 1e-9
C0 = cost(x_guess, y, g)
C1 = cost(x_guess + np.array([delta, 0]), y, g)
approx_deriv0 = (C1 - C0) / delta
C1 = cost(x_guess + np.array([0, delta]), y, g)
approx_deriv1 = (C1 - C0) / delta
approx_deriv = np.array([approx_deriv0, approx_deriv1])
deriv = cost_deriv(x_guess, y, g)
# fails
min_res_SLSQP = sp_optimize.minimize(cost, x_guess, args=(y, g), jac=cost_deriv,
bounds=prm_bounds, method='SLSQP', options={'disp': True})
print(min_res_SLSQP)
Upvotes: 3
Views: 8188
Reputation: 33532
Instead of minimizing np.sum(e ** 2)
, minimize sqrt(np.sum(e ** 2))
, or better (in terms of calculation): np.linalg.norm(e)
!
This modification:
x
With this change, all cases work, even using numerical-differentiation (i was too lazy to modify the gradient, which needs to reflect this!).
Example output (number of func-evals gives away num-diff):
Optimization terminated successfully. (Exit mode 0)
Current function value: 3.815547437029837e-06
Iterations: 16
Function evaluations: 88
Gradient evaluations: 16
fun: 3.815547437029837e-06
jac: array([-6.09663382, -2.48862544])
message: 'Optimization terminated successfully.'
nfev: 88
nit: 16
njev: 16
status: 0
success: True
x: array([ 2.00000037, 0.10000018])
Optimization terminated successfully. (Exit mode 0)
Current function value: 0.0002354577991007501
Iterations: 23
Function evaluations: 114
Gradient evaluations: 23
fun: 0.0002354577991007501
jac: array([ 435.97259208, 288.7483819 ])
message: 'Optimization terminated successfully.'
nfev: 114
nit: 23
njev: 23
status: 0
success: True
x: array([ 1.99999977, 0.10000014])
Optimization terminated successfully. (Exit mode 0)
Current function value: 0.0003392807206384532
Iterations: 21
Function evaluations: 112
Gradient evaluations: 21
fun: 0.0003392807206384532
jac: array([ 996.57340243, 51.19298764])
message: 'Optimization terminated successfully.'
nfev: 112
nit: 21
njev: 21
status: 0
success: True
x: array([ 2.00000008, 0.10000104])
While there are probably some problems with SLSQP, it's still one of the most tested and robust codes given that broad application-spectrum!
I would also expect SLSQP to be much better here compared to COBYLA, as the latter is based heavily on linearizations. (but just take it as a guess; it's easy to try given the minimize-interface!)
Alternative
In general, an Interior-point based solver for Convex Quadratic Programming will be the best approach here. But for this, you need to leave scipy. (or maybe an SOCP-solver would be better... i'm not sure).
cvxpy brings a nice-modelling system and a good open-source solver (ECOS; although technically a conic-solver -> more general and less robust; but should beat SLSQP).
Using cvxpy and ECOS, this looks like:
import numpy as np
import cvxpy as cvx
""" Problem data """
A = np.ones((10,2)); A[:,0] = np.linspace(-5,5, 10)
x_true = np.array([2, 2/20])
y = A.dot(x_true)
x_guess = x_true / 2
prm_bounds = ((0, 3), (0,1))
# problematic case
A = 100 * A
y = A.dot(x_true)
""" Solve """
x = cvx.Variable(len(x_true))
constraints = [x[0] >= x[1]]
for ind, (lb, ub) in enumerate(prm_bounds): # ineffecient -> matrix-based expr better!
constraints.append(x[ind] >= lb)
constraints.append(x[ind] <= ub)
objective = cvx.Minimize(cvx.norm(A*x - y))
problem = cvx.Problem(objective, constraints)
problem.solve(solver=cvx.ECOS, verbose=False)
print(problem.status)
print(problem.value)
print(x.value.T)
# optimal
# -6.67593652593801e-10
# [[ 2. 0.1]]
Upvotes: 3