John Sorensen
John Sorensen

Reputation: 960

Faster SciPy Optimizations

I need to find the minimum of a cost function with several thousand variables. The cost function is simply a least squares calculation and can be computed easily and quickly with numpy vectorization. Despite this, the optimization still takes a painstakingly long time. My guess is that the slow runtime is occuring in SciPy's minimizer rather than my cost function. How can I change the parameters of SciPy's minimizer to speed up the runtime?

Sample Code:

import numpy as np
from scipy.optimize import minimize

# random data
x = np.random.randn(100, 75)

# initial weights guess
startingWeights = np.ones(shape=(100, 75))

# random y vector 
y = np.random.randn(100)


def costFunction(weights):
   # reshapes flattened weights into 2d matrix
   weights = np.reshape(weights, newshape=(100, 75))

   # weighted row-wise sum
   weighted = np.sum(x * weights, axis=1)

   # squared residuals
   residualsSquared = (y - weighted) ** 2

   return np.sum(residualsSquared)


result = minimize(costFunction, startingWeights.flatten())

Upvotes: 3

Views: 4612

Answers (3)

JeeyCi
JeeyCi

Reputation: 589

I also checked joni's suggestion with autograd:

import numpy as np
import autograd.numpy as anp
from autograd import  value_and_grad

# random data
X = np.random.randn(100, 75)

# initial weights guess
startingWeights = np.ones(shape=(100, 75))

# random y vector
y = np.random.randn(100)

def costFunction(weights):
   # reshapes flattened weights into 2d matrix
   weights = anp.reshape(weights, (100, 75))

   # weighted row-wise sum
   weighted = anp.sum(X * weights, axis=1)

   # squared residuals
   residualsSquared = (y - weighted) ** 2

   return anp.sum(residualsSquared)

# create the derivatives
obj_and_grad = value_and_grad(costFunction)

%timeit obj_and_grad(startingWeights.flatten())
# 391 µs ± 69.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

and direct (care factorization) method for LS-solution from scipy.linalg.lstsq with lapack_driver='gelsy':

import numpy as np
from scipy.linalg import lstsq as sp_lstsq
from scipy.linalg import solve

# random data
X = np.random.randn(100, 75)

# random y vector
y = np.random.randn(100)

#custom OLS by LU factorization
def ord_lu(X, y):
    A = X.T @ X
    b = X.T @ y
    beta = solve(A, b, overwrite_a=True, overwrite_b=True,
                 check_finite=True)
    return beta

%timeit -n1000 -r7 result = ord_lu(X, y)
# 181 µs ± 5.07 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit -n1000 -r7 result = sp_lstsq(X, y, lapack_driver='gelsy', check_finite=True)
# 278 µs ± 7.08 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
  • seems faster

P.S. QR-factorization can be used for sparse matrices & other approaches, or here- some research

P.P.S I don't know the aim of your calculations, but I would rather remove multicollinearity till 1 Principal Component in features (x) before applying any LS method - even Direct :

import numpy as np

# generate x and y
x = np.random.randn(100, 1)
y =  np.random.randn(100)
# assemble matrix A
A =  np.ones(shape=(100, 1))

# turn y into a column vector
y = y[:, np.newaxis]

# Direct least square regression - Normal Equation - not recommended - numerically unstable
alpha = np.dot((np.dot(np.linalg.inv(np.dot(A.T,A)),A.T)),y)
print(alpha)


%timeit -n1000 -r7 result = alpha
# 52.1 ns ± 12.3 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

P.P.P.S. perhaps A Classical Least Squares Method for Quantitative Spectral Analysis with Python was the aim?

Upvotes: 0

JeeyCi
JeeyCi

Reputation: 589

The cost function is simply a least squares calculation

do not use minimize, use least-squares - if it is exactly, that you need, - Example:

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import least_squares

x= np.array([0.23, 0.66, 0.93, 1.25, 1.75, 2.03, 2.24, 2.57, 2.87, 2.98])
y= np.array([0.25, -0.27, -1.12, -0.45, 0.28, 0.13, -0.27, 0.26, 0.58, 1.03])    
plt.plot(x,y, 'bv')
plt.show()

def model(theta, t):
    return theta[0]* np.exp(t) + theta[1]*np.log(t) + theta[2]*np.sin(t) + theta[3]*np.cos(t)      #  a ∗ e^x   + b ∗ l n x + c ∗ s i n x + d ∗ c o s x

def residual(theta, t, y):       # f-a
     return (model(theta, t) - y).flatten()

theta = np.array([0.5, 0.5, 0.5, 0.5])

res = least_squares(residual, theta,   args=(x, y),  verbose=1)        # jac=jac,
print(res)

x_test = np.linspace(0, 3)
y_test = model(res.x, x_test)
plt.plot(x,y, 'bo', x_test, y_test, 'k-')
plt.show()
  • as you see there's NO place for your huge x0=startingWeights= shape(100,75) (as well, for your minimization method you need ONLY x0=shape(75) as init_guess of coeffs_- if you really need such) - because x0 is just init_coeffs_param (or theta) to optimize vs. your init_data (100,75)-xs...

  • optimization procedure is being executed locally for each your data_point (out of 100 you're having ys) , even if added the logics of regularization

  • if your problem's formalization is correct?... if you could prefer Distance metric prior to Optimization?.. or choose Global_Optimization algos to minimize the whole space you have?.. or manually create Jacobian_fx for minimization method (as param - as joni answered)?..

  • anyway, I think, you'd better not reformulating your Convex Problem to LS or can consider only if Linear-LS-Problem to Convex-Problem , if in reallity you need global optimum of data shape(100,75))...

  • again, I doubt, that your point consists of such amount (75) of ! Linearly-Independent dimensions - regularization can help... - either change your objective for minimization or apply any tip I mentioned

If it is - can see LS-Optimization -- some opportunities described (incl. Jacobi method (e.g. impl), preconditioning, Conjugate gradient for sparse matrix [that becomes your multidimensional data, using 2-norm condition number, if your data is spatial] - make rotation for the better choice of step-direction in minimization, etc.)...

P.S. When matrices grow up - matrix-factorization needed

Upvotes: 0

joni
joni

Reputation: 7157

As already noted in the comments, it's highly recommended to provide the exact objective gradient for a large problem with N = 100*75 = 7500 variables. Without a provided gradient, it will be approximated by finite differences and by means of the approx_derivative function. However, finite differences are error-prone and computationally expensive due to the fact that each evaluation of the gradient requires 2*N evaluations of the objective function (without caching).

This can be easily illustrated by timing the objective and the approximated gradient:

In [7]: %timeit costFunction(startingWeights.flatten())
23.5 µs ± 2.03 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [8]: from scipy.optimize._numdiff import approx_derivative

In [9]: %timeit approx_derivative(costFunction, startingWeights.flatten())
633 ms ± 33.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Consequently, each gradient evaluation takes more than half of a second on my machine! A more efficient approach to evaluate the gradient is algorithmic differentiation. Using the JAX library, it's quite easy:

import jax.numpy as jnp
from jax import jit, value_and_grad

def costFunction(weights):
   # reshapes flattened weights into 2d matrix
   weights = jnp.reshape(weights, newshape=(100, 75))

   # weighted row-wise sum
   weighted = jnp.sum(x * weights, axis=1)

   # squared residuals
   residualsSquared = (y - weighted) ** 2

   return jnp.sum(residualsSquared)

# create the derivatives
obj_and_grad = jit(value_and_grad(costFunction))

Here, value_and_grad creates a function that evaluations the objective and the gradient and returns both, i.e. obj_value, grad_values = obj_and_grad(x0). So let's time this function:

In [12]: %timeit obj_and_grad(startingWeights.flatten())
132 µs ± 6.62 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Thus, we now evaluate the objective and the gradient nearly 5000 times faster than before. Finally, we can tell minimize that our objective function returns the objective and the gradient by setting jac=True. So

minimize(obj_and_grad, x0=startingWeights.flatten(), jac=True)

should significantly speed up the optimization.

PS: You can also try the state-of-the-art Ipopt solver interfaced by the cyipopt package. It also provides a scipy-like interface similar to scipy.optimize.minimize.

Upvotes: 4

Related Questions