Reputation:
I'm trying to solve a set of equations using scipy.minimize, however I'm not getting satisfatory results, so maybe I'm gettting something wrong. I want to solve the following system of equations.
12.25 * (x + y * 2.2 + z * 4.84) - 8.17437483750257 = 0
12.25 * (x + y * 3.1 + z * 9.61) - 21.9317236606432 = 0
12.25 * (x + y * 4 + z * 16) - 107.574834524443 = 0
Using Wolfram Alpha I get the answers
x=22.626570068753, y=-17.950683342597, z=3.6223614029055
Which indeed solve the system of equations, giving a residual error of
9.407585821463726e-12
Now using scipy.minimize I do:
import numpy as np
from scipy.optimize import fsolve
from scipy.optimize import minimize
def my_func(p):
points = [8.17437483750257, 21.9317236606432, 107.574834524443]
h1 = abs(12.25 * (p[0] + p[1] * 2.2 + p[2] * 4.84) - points[0])
h2 = abs(12.25 * (p[0] + p[1] * 3.1 + p[2] * 9.61) - points[1])
h3 = abs(12.25 * (p[0] + p[1] * 4 + p[2] * 16) - points[2])
return h1 + h2 + h3
ini = np.array([22, -15, 5]) # Initial points close to solution
res = minimize(my_func, ini)
print(res)
fun: 1.4196640741924451
hess_inv: array([[ 20.79329103, -14.63447889, 2.36145776],
[-14.63447889, 10.30037625, -1.66214485],
[ 2.36145776, -1.66214485, 0.26822135]])
jac: array([ 12.25 , 60.02499545, 254.43249989])
message: 'Desired error not necessarily achieved due to precision loss.'
nfev: 261
nit: 8
njev: 64
status: 2
success: False
x: array([ 21.39197235, -17.08623345, 3.48344393])
First, It says success=False and second it finds solutions that are not optimal.
Why having initial values that are close to the optimal solution it fails to find those solutions.
Is it something wrong with the definition of the optimizer?
Tried running it giving initial values of [0,0,0] and it just gives awful results
ini = np.array([0, 0, 0]) # Initial points close to solution
res = minimize(my_func, ini)
print(res)
fun: 73.66496363902732
hess_inv: array([[ 0.98461683, -0.04223651, -0.1207056 ],
[-0.04223651, 0.88596592, -0.31885642],
[-0.1207056 , -0.31885642, 0.13448927]])
jac: array([ 12.25 , 15.92499924, -18.98750019])
message: 'Desired error not necessarily achieved due to precision loss.'
nfev: 164
nit: 1
njev: 40
status: 2
success: False
x: array([0.02901304, 0.08994042, 0.29448233])
Note: I don't want to use fsolve
to find the solutions, but minimize
.
The reason is that my real problem involves having more equations than unknowns, so at the end I want a solution that minimizes the errors of all this equations.
However since it was not giving good results, I wanted to first test on an easy problem for which an exact solution exists. But even in this case it doesn't work.
Once I make it work for this problem I'll expand it adding more equations.
Upvotes: 0
Views: 1145
Reputation: 2093
Optimisation problems (and solvers) usually benefit from a well behaved (smooth) "optimisation surface". When you use the abs
function, it creates a "choppy" surface, with points were the derivatives are not continuous.
If instead of using abs
you use a quadratic function (that has the same effect), you get a solution close to what you expect. Just change my_func
to:
def my_func(p):
points = [8.17437483750257, 21.9317236606432, 107.574834524443]
h1 = (12.25 * (p[0] + p[1] * 2.2 + p[2] * 4.84) - points[0])**2
h2 = (12.25 * (p[0] + p[1] * 3.1 + p[2] * 9.61) - points[1])**2
h3 = (12.25 * (p[0] + p[1] * 4 + p[2] * 16) - points[2])**2
return h1 + h2 + h3
What I got is:
fun: 8.437863292878727e-10
hess_inv: array([[ 0.64753863, -0.43474506, 0.06909179],
[-0.43474506, 0.29487798, -0.04722923],
[ 0.06909179, -0.04722923, 0.00761762]])
jac: array([-6.26698693e-12, 6.22490927e-10, -5.11716516e-10])
message: 'Optimization terminated successfully.'
nfev: 55
nit: 7
njev: 11
status: 0
success: True
x: array([ 22.62653789, -17.95066124, 3.62235782])
Upvotes: 1
Reputation: 44906
...my real problem involves having more equations than unknowns, so at the end I want a solution that minimizes the errors of all this equations
This sounds a lot like the problem solved in the generalized method of moments (GMM), where you also have more equations than unknowns.
Problems of this kind are usually solved using least squares. Suppose your whole system looks like this:
h1(x, y, z) = 0
h2(x, y, z) = 0
h3(x, y, z) = 0
h4(x, y, z) = 0
It has 3 unknowns and 4 equations. Then your objective function will be:
F(x, y, z) = H(x, y, z)' * W * H(x, y, z)
H(x, y, z)
is the vector of all hj(x, y, z)
aboveH(x, y, z)'
is its transposeW
is the weighting matrixIf W
is the identity matrix, you get the least squares objective function. Then, F(x, y, z)
is a quadratic form (basically a parabola in multiple dimensions), which should be simple to optimize because it's convex and smooth.
Your code uses absolute values like h1 = abs(12.25 * (p[0] + p[1] * 2.2 + p[2] * 4.84) - points[0])
, but abs
can be difficult to differentiate near the origin, yet that's exactly where your optimum lies, since you essentially want h1
to equal zero.
You can approximate the absolute value function by squaring the error:
h1 =(12.25 * (p[0] + p[1] * 2.2 + p[2] * 4.84) - points[0])**2
This results in basically the same approach as the GMM (or least squares) and gives you a function that's easy to optimize since the square is smooth near the origin.
Upvotes: 1