alonrieger
alonrieger

Reputation: 23

IRLS vs. Linear Programming for Large Scale for Least Absolute Deviation (LAD) Regression

I was trying to run a Least Absolute Deviance regression (L1 regression). I did it by designing a linear program, and solving with CVXOPT (in Python). The number of features (including constant) was 13 and the number of samples was ~100K. It took many hours to run to completion. Then I realized that I can run statsmodels' QuantReg with q=0.5. it took a few seconds. I read that they use Iteratively reweighted least squares (IRLS). Is IRLS faster than linear programming for this purpose, or maybe a different formulation of the linear optimization problem, and a use on a better solver like Gurobi, would reduce the running time of the linear program substantially? The reason I am asking is because I want to try a loss function which involves a summation of two different absolute functions, with different weights. I believe that I can employ IRLS for that problem as well, but I am not sure, and I am more comfortable with linear programming.

The formulation I used for the linear program was as follows:

enter image description here

The resulting code for CVXOPT was as follows:

def l1_fit(x,y):
'''

:param x: feature matrix
:param y: label vector
:return: LAD coefficients

The problem I construct is:
    min c'*(v|A)
    subject to:
        (-I|x)*(v|A)<=y
        (-I|-x)*(v|A)<=-y
'''
c = matrix(np.concatenate((np.ones(y.shape[0]),np.zeros(x.shape[1]))).astype(float).tolist(),
           (y.shape[0]+x.shape[1],1),'d')
G = scipy_sparse_to_spmatrix(np.transpose(hstack((vstack((-identity(y.shape[0]),np.transpose(x))),
                          vstack((-identity(y.shape[0]),np.transpose(-x))))).astype(float)))
h = matrix(np.concatenate((y,-y)).astype(float).tolist(),(2*y.shape[0],1),'d')
sol = lp(c, G, h)
return sol[1][y.shape[0]:]

Upvotes: 2

Views: 521

Answers (1)

Royi
Royi

Reputation: 4963

If you have access to Gurobi or CPlex, then you should follow the analysis in Erwin Kalvelagen - Linear Programming and LAD Regression.

It shows that the best formulation of the problem as Linear Programming is given by:

enter image description here

enter image description here

enter image description here

Yet, I'm not sure converting the model into an LP problem is the way to go. I generated a model matrix $\boldsymbol{A} \in \mathbb{R}^{100,000 \times 15}$.

I solved the problem using Convex.jl (Equivalent of CVXPY for Julia) with the ECOS Solver (Should be available in CVXPY) in its direct form.
The problem was solved in under 50 [Sec]. So about 1-2 orders of magnitude faster.

Moreover, I implemented a solver based on Iteratively Reweighted Least Squares (IRLS). I could solve the problem in less than a second.

enter image description here

The Convex.jl (ECOS) solution run time: 44.50541 [Sec]
The IRLS solution run time: 0.22733 [Sec]

The cause for the speed is that the IRLS most consuming operation, solving a linear system, is done on A' * A which in your case is pretty small.

I implemented it in Julia following:


The code is implemented in Julia with a relatively efficient IRLS code I have written (No allocations).
You could probably imitate it pretty closely in Python using NumPy, SciPy and Numba.

So solving your problem is pretty much an hour of coding away.

The code is available on my StackOverflow GitHub Repository (Look at the StackOverflow\Q64422417 folder).

Upvotes: 0

Related Questions