Reputation: 588
In the documentations for scipy.optimize.root with method = lm
the following are the default for options
keyword.
options={'col_deriv': 0, 'diag': None, 'factor': 100, 'gtol': 0.0, 'eps': 0.0, 'func': None, 'maxiter': 0, 'xtol': 1.49012e-08, 'ftol': 1.49012e-08}
On the description for col_deriv
they say that
col_deriv : bool, optional
non-zero to specify that the Jacobian function computes derivatives down the columns (faster, because there is no transpose operation)
If I understand the statement it says that if I write col_deriv = True
for example, the jacobian
will be computed column wise and therefore faster.
Question: If it is faster, why isn't a non zero value for col_deriv
the default value?.
Am I missing something here?.
Upvotes: 1
Views: 1409
Reputation: 8378
Maybe the documentation from scipy.optimize.leastsq
could help since it documents both Dfun
(Jacobian) and col_deriv
. From Dfun
we get:
Dfun : callable, optional
A function or method to compute the Jacobian of func with derivatives across the rows. If this is None, the Jacobian will be estimated.
From col_deriv
we get:
col_deriv : bool, optional
non-zero to specify that the Jacobian function computes derivatives down the columns (faster, because there is no transpose operation).
My reading of this is as follows:
By default, scipy
expects that a function that computes the Jacobian matrix return a matrix that follows "normal" definition (see, e.g., https://en.wikipedia.org/wiki/Jacobian_matrix_and_determinant).
However, scipy
itself calls other functions, possibly written in Fortran see, e.g., minpack, which expect that derivatives (with regard to coordinates) are placed in columns.
It follows that if the function that computes the Jacobian matrix could return a matrix with derivatives placed along columns instead of rows, then scipy
will not need to transpose the Jacobian matrix before passing it to the minpack
function thus saving computational time.
Upvotes: 2
Reputation:
The default corresponds to the mathematical convention for writing the Jacobian matrix, as recorded on Wikipedia among other places: the first row of the matrix consists of the partial derivatives of the first component of the function, etc. This works better in multivariable calculus, because one can multiply that matrix on the right by the column vector of variables to obtain the linear approximation to the function.
For example, let
fun = lambda x: [x[0]**3-x[1], 2*x[0]+x[1]-12]
The conventional way of writing the Jacobian would be
jac = lambda x: [[3*x[0]**2, -1], [2, 1]]
where [3*x[0]**2, -1]
comes from differentiating the first component of fun
, x[0]**3-x[1]
.
The method root(fun, [0, 0], jac=jac)
converges in 12 iterations. Or we could write the Jacobian in transposed way,
jact = lambda x: [[3*x[0]**2, 2], [-1, 1]]
and use
root(fun, [0, 0], jac=jact, options={"col_deriv": 1})
to the same effect. It's not obvious to me that the gain is worth it, but perhaps it is for very large systems.
If "col_deriv": 1
was the default, there'd be unhappy users who implemented the Jacobian in the way they are used to and found the performance is poor (using an incorrect version of Jacobian in the above example more than doubles the number of steps of root search).
Upvotes: 2