Reputation: 23637
I need to minimize a cost function for a large number (1000s) of different inputs. Obviously, this can be implemented by looping over scipy.optimize.minimize
or any other minimization routine. Here is an example:
import numpy as np
import scipy as sp
def cost(x, a, b):
return np.sum((np.sum(a * x.reshape(a.shape), axis=1) - b)**2)
a = np.random.randn(500, 40)
b = np.array(np.arange(500))
x = []
for i in range(a.shape[0]):
res = sp.optimize.minimize(cost, np.zeros(40), args=(a[None, i], b[None, i]))
x.append(res.x)
It finds x[i, :]
that minimize cost
for each a[i, :]
and b[i]
, but this is very slow. I guess looping over minimize
causes considerable overhead.
A partial solution is to solve for all x
simultaneously:
res = sp.optimize.minimize(cost, np.zeros_like(a), args=(a, b))
This is even slower than the loop. minimize
does not know that elements in x
are group-wise independent. So it computes the full hessian although a block-diagonal matrix would be sufficient, considering the problem structure. This is slow and overflows my computer's memory.
Is there any way to inform minimize
or another optimization function about the problem structure so that it can solve multiple indepentent optimizations in a single function call? (Similar to certain options supported by Matlab's fsolve
.)
Upvotes: 3
Views: 2566
Reputation: 23637
First, a solution:
Turns out scipy.optimize.least_squares
supports exploiting the structure of the jacobian by setting the jac_sparsity
argument.
The least_squares
function works slightly different than minimize
so the cost function needs to be rewritten to return residuals instead:
def residuals(x, a, b):
return np.sum(a * x.reshape(a.shape), axis=1) - b
The jacobian has block-diagonal sparsity structure, given by
jacs = sp.sparse.block_diag([np.ones((1, 40), dtype=bool)]*500)
And calling the optimization routine:
res = sp.optimize.least_squares(residuals, np.zeros(500*40),
jac_sparsity=jacs, args=(a, b))
x = res.x.reshape(500, 40)
But is it really faster?
%timeit opt1_loopy_min(a, b) # 1 loop, best of 3: 2.43 s per loop
%timeit opt2_loopy_min_start(a, b) # 1 loop, best of 3: 2.55 s per loop
%timeit opt3_loopy_lsq(a, b) # 1 loop, best of 3: 13.7 s per loop
%timeit opt4_dense_lsq(a, b) # ValueError: array is too big; ...
%timeit opt5_jacs_lsq(a, b) # 1 loop, best of 3: 1.04 s per loop
Conclusions:
opt1
) and re-use of the starting point (opt2
) without sorting.least_squares
(opt3
) is considerable slower than looping over minimize
(opt1
, opt2
).least_squares
because the jacobian matrix does not fit in memory.least_squares
(opt5
) seems to be the fastest approach.This is the timing test environment:
import numpy as np
import scipy as sp
def cost(x, a, b):
return np.sum((np.sum(a * x.reshape(a.shape), axis=1) - b)**2)
def residuals(x, a, b):
return np.sum(a * x.reshape(a.shape), axis=1) - b
a = np.random.randn(500, 40)
b = np.arange(500)
def opt1_loopy_min(a, b):
x = []
x0 = np.zeros(a.shape[1])
for i in range(a.shape[0]):
res = sp.optimize.minimize(cost, x0, args=(a[None, i], b[None, i]))
x.append(res.x)
return np.stack(x)
def opt2_loopy_min_start(a, b):
x = []
x0 = np.zeros(a.shape[1])
for i in range(a.shape[0]):
res = sp.optimize.minimize(cost, x0, args=(a[None, i], b[None, i]))
x.append(res.x)
x0 = res.x
return np.stack(x)
def opt3_loopy_lsq(a, b):
x = []
x0 = np.zeros(a.shape[1])
for i in range(a.shape[0]):
res = sp.optimize.least_squares(residuals, x0, args=(a[None, i], b[None, i]))
x.append(res.x)
return x
def opt4_dense_lsq(a, b):
res = sp.optimize.least_squares(residuals, np.zeros(a.size), args=(a, b))
return res.x.reshape(a.shape)
def opt5_jacs_lsq(a, b):
jacs = sp.sparse.block_diag([np.ones((1, a.shape[1]), dtype=bool)]*a.shape[0])
res = sp.optimize.least_squares(residuals, np.zeros(a.size), jac_sparsity=jacs, args=(a, b))
return res.x.reshape(a.shape)
Upvotes: 4
Reputation:
I guess looping over minimize causes considerable overhead.
Wrong guess. The time required for minimizing a function dwarfs any loop overhead. There is no vectorization magic for this problem.
Some time can be saved by using a better starting point of minimization. First, sort the parameters so that consecutive loops have similar parameters. Then use the end point of previous minimization as a starting point of the next one:
a = np.sort(np.random.randn(500, 40), axis=0) # sorted parameters
b = np.arange(500) # no need for np.array here, np.arange is already an ndarray
x0 = np.zeros(40)
for i in range(a.shape[0]):
res = minimize(cost, x0, args=(a[None, i], b[None, i]))
x.append(res.x)
x0 = res.x
This saves 30-40 percent of execution time in my test.
Another, minor, optimization to do is to preallocate an ndarray of appropriate size for resulting x-values, instead of using a list and append
method. Before the loop: x = np.zeros((500, 40))
; within the loop, x[i, :] = res.x
.
Upvotes: 2