MB-F
MB-F

Reputation: 23637

Solve multiple independent optimizations in scipy

I need to minimize a cost function for a large number (1000s) of different inputs. Obviously, this can be implemented by looping over scipy.optimize.minimize or any other minimization routine. Here is an example:

import numpy as np
import scipy as sp

def cost(x, a, b):
    return np.sum((np.sum(a * x.reshape(a.shape), axis=1) - b)**2)

a = np.random.randn(500, 40)
b = np.array(np.arange(500))

x = []
for i in range(a.shape[0]):
    res = sp.optimize.minimize(cost, np.zeros(40), args=(a[None, i], b[None, i]))
    x.append(res.x)

It finds x[i, :] that minimize cost for each a[i, :] and b[i], but this is very slow. I guess looping over minimize causes considerable overhead.

A partial solution is to solve for all x simultaneously:

res = sp.optimize.minimize(cost, np.zeros_like(a), args=(a, b))

This is even slower than the loop. minimize does not know that elements in x are group-wise independent. So it computes the full hessian although a block-diagonal matrix would be sufficient, considering the problem structure. This is slow and overflows my computer's memory.

Is there any way to inform minimize or another optimization function about the problem structure so that it can solve multiple indepentent optimizations in a single function call? (Similar to certain options supported by Matlab's fsolve.)

Upvotes: 3

Views: 2566

Answers (2)

MB-F
MB-F

Reputation: 23637

First, a solution:

Turns out scipy.optimize.least_squares supports exploiting the structure of the jacobian by setting the jac_sparsity argument.

The least_squares function works slightly different than minimize so the cost function needs to be rewritten to return residuals instead:

def residuals(x, a, b):
    return np.sum(a * x.reshape(a.shape), axis=1) - b

The jacobian has block-diagonal sparsity structure, given by

jacs = sp.sparse.block_diag([np.ones((1, 40), dtype=bool)]*500)

And calling the optimization routine:

res = sp.optimize.least_squares(residuals, np.zeros(500*40),
                                jac_sparsity=jacs, args=(a, b))
x = res.x.reshape(500, 40)

But is it really faster?

%timeit opt1_loopy_min(a, b)        # 1 loop, best of 3: 2.43 s per loop
%timeit opt2_loopy_min_start(a, b)  # 1 loop, best of 3: 2.55 s per loop
%timeit opt3_loopy_lsq(a, b)        # 1 loop, best of 3: 13.7 s per loop
%timeit opt4_dense_lsq(a, b)        # ValueError: array is too big; ...
%timeit opt5_jacs_lsq(a, b)         # 1 loop, best of 3: 1.04 s per loop

Conclusions:

  1. There is no obvious difference between the original solution (opt1) and re-use of the starting point (opt2) without sorting.
  2. looping over least_squares (opt3) is considerable slower than looping over minimize (opt1, opt2).
  3. The problem is too big to naiively run with least_squares because the jacobian matrix does not fit in memory.
  4. Exploiting the sparsity structure of the jacobian in least_squares (opt5) seems to be the fastest approach.

This is the timing test environment:

import numpy as np
import scipy as sp

def cost(x, a, b):
    return np.sum((np.sum(a * x.reshape(a.shape), axis=1) - b)**2)

def residuals(x, a, b):
    return np.sum(a * x.reshape(a.shape), axis=1) - b

a = np.random.randn(500, 40)
b = np.arange(500)

def opt1_loopy_min(a, b):
    x = []
    x0 = np.zeros(a.shape[1])
    for i in range(a.shape[0]):
        res = sp.optimize.minimize(cost, x0, args=(a[None, i], b[None, i]))
        x.append(res.x)
    return np.stack(x)

def opt2_loopy_min_start(a, b):
    x = []
    x0 = np.zeros(a.shape[1])
    for i in range(a.shape[0]):
        res = sp.optimize.minimize(cost, x0, args=(a[None, i], b[None, i]))
        x.append(res.x)
        x0 = res.x
    return np.stack(x)

def opt3_loopy_lsq(a, b):
    x = []
    x0 = np.zeros(a.shape[1])
    for i in range(a.shape[0]):
        res = sp.optimize.least_squares(residuals, x0, args=(a[None, i], b[None, i]))
        x.append(res.x)
    return x

def opt4_dense_lsq(a, b):
    res = sp.optimize.least_squares(residuals, np.zeros(a.size), args=(a, b))
    return res.x.reshape(a.shape)

def opt5_jacs_lsq(a, b):
    jacs = sp.sparse.block_diag([np.ones((1, a.shape[1]), dtype=bool)]*a.shape[0])
    res = sp.optimize.least_squares(residuals, np.zeros(a.size), jac_sparsity=jacs, args=(a, b))
    return res.x.reshape(a.shape)

Upvotes: 4

user6655984
user6655984

Reputation:

I guess looping over minimize causes considerable overhead.

Wrong guess. The time required for minimizing a function dwarfs any loop overhead. There is no vectorization magic for this problem.

Some time can be saved by using a better starting point of minimization. First, sort the parameters so that consecutive loops have similar parameters. Then use the end point of previous minimization as a starting point of the next one:

a = np.sort(np.random.randn(500, 40), axis=0)   # sorted parameters
b = np.arange(500)   # no need for np.array here, np.arange is already an ndarray

x0 = np.zeros(40)
for i in range(a.shape[0]):
    res = minimize(cost, x0, args=(a[None, i], b[None, i]))
    x.append(res.x)
    x0 = res.x

This saves 30-40 percent of execution time in my test.

Another, minor, optimization to do is to preallocate an ndarray of appropriate size for resulting x-values, instead of using a list and append method. Before the loop: x = np.zeros((500, 40)); within the loop, x[i, :] = res.x.

Upvotes: 2

Related Questions