Reputation: 900
I have a loss function that needs to be minimized
def loss(x: np.ndarray[float]) -> float
My problem has nDim=10
dimensions. Loss function works for 1D arrays of shape (nDim,)
, and with 2D arrays of shape (nSample, nDim)
for an arbitrary number of samples. Because of the nature of the implementation of the loss function (numpy), it is significantly faster to make a single call to the loss function with several samples packed into 2D argument than to make several 1D calls.
The minimizer I am currently running is
sol = scipy.optimize.basinhopping(loss, x0, minimizer_kwargs={"method": "SLSQP"})
It does the job, but is too slow. As of current, the minimizer is making single 1D calls to the loss function. Based on observing the sample points, it seems, SLSQP is performing numerical differentiation, thus sampling 11 points for each 1 sample to calculate the gradient. Theoretically, it should be possible to implement this minimizer with vectorized function calls, requesting all 11 sample points from the loss function simultaneously.
I was hoping that there would be a vectorize
flag for SLSQP, but it does not seem to be the case, please correct me if I am wrong.
Note also that the loss function is far too complicated for analytic calculation of derivatives, so explicit Jacobian not an option.
Question: Does Scipy or any other minimization library for python support a global optimization strategies (such as basinhopping) with vectorized loss function and no available Jacobian?
Upvotes: 1
Views: 59
Reputation: 25220
Based on observing the sample points, it seems, SLSQP is performing numerical differentiation, thus sampling 11 points for each 1 sample to calculate the gradient. Theoretically, it should be possible to implement this minimizer with vectorized function calls, requesting all 11 sample points from the loss function simultaneously.
Yes, you can do that.
Here's an example. This implements a 2-point numerical differentiation approach, based on the code in scipy/optimize/_numdiff.py
. It's very similar to what SciPy is doing when you don't give it a Jacobian.
The code optimizes an eggholder function, which is intended to have lots of local minima to get stuck in.
The important code is inside loss_to_jac()
, which is responsible for converting the loss function into its Jacobian.
import scipy.optimize
import numpy as np
import functools
import time
import matplotlib.pyplot as plt
def eggholder(x):
# vectorized version of eggholder function, originally from
# scipy docs: https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.shgo.html
return (-(x[..., 1] + 47) * np.sin(np.sqrt(abs(x[..., 0]/2 + (x[..., 1] + 47))))
-x[..., 0] * np.sin(np.sqrt(abs(x[..., 0] - (x[..., 1] + 47)))))
def plot_eggholder():
x = np.arange(-512, 513)
y = np.arange(-512, 513)
xgrid, ygrid = np.meshgrid(x, y)
xy = np.stack([xgrid, ygrid]).transpose(1, 2, 0)
fig = plt.figure(figsize=(8,6))
ax = fig.add_subplot(111, projection='3d')
ax.view_init(45, -45)
ax.plot_surface(xgrid, ygrid, eggholder(xy), cmap='terrain')
ax.set_xlabel('x')
ax.set_ylabel('y')
plt.show()
def loss_to_jac(func, N, eps=None):
if eps is None:
# Use square root of machine precision by default
eps = (np.finfo(float).eps) ** 0.5
array_index = np.arange(N)
def jac(x0, *args, **kwargs):
x0 = np.asarray(x0)
evaluated_points = np.zeros((N + 1, N))
evaluated_points += x0
abs_step = x0 * eps
# Avoid step of zero
dx = ((x0 + abs_step) - x0)
abs_step = np.where(dx == 0, eps, abs_step)
# Find actual dx after rounding error, and replacing zeros
dx = ((x0 + abs_step) - x0)
# Add absolute step to lower diagonal of evaluated_points
evaluated_points[1:][array_index, array_index] += abs_step
result = func(evaluated_points, *args, **kwargs)
f0 = result[0]
df = result[1:] - f0
J = df / dx
return J
return jac
def main():
plot_eggholder()
x0 = np.array([0, 0])
jac_from_loss = loss_to_jac(eggholder, len(x0))
for jac in [None, jac_from_loss]:
start = time.time()
print("Using jacobian:", jac)
result = scipy.optimize.basinhopping(
eggholder,
x0,
minimizer_kwargs=dict(
method="SLSQP",
jac=jac,
),
niter=1000
)
print(result)
print(f"Duration: {time.time() - start:.2f}")
print("\n" * 2)
if __name__ == "__main__":
main()
On my system, optimizing with a vectorized Jacobian is 40% faster. The exact speed will depend on how many dimensions you have, as eggholder is a 2-dimensional function, which requires fewer calls to numerically differentiate.
Upvotes: 1
Reputation: 3683
differential_evolution
is a global optimizer that does not require gradients. It has a vectorized
keyword to enable many function evaluations in a single call.
Alternatively, you could write a function that takes the Jacobian with scipy.differentiate.jacobian
, which calls the function at all required points at once, and pass that as the Jacobian callable. However, it is designed for accuracy, not speed, so you should probably set coarse tolerances
and low order
.
Upvotes: 2