amitkaps
amitkaps

Reputation: 73

Ridge Regression: Scikit-learn vs. direct calculation does not match for alpha > 0

In Ridge Regression, we are solving Ax=b with L2 Regularization. The direct calculation is given by:

x = (ATA + alpha * I)-1ATb

I have looked at the scikit-learn code and they do implement the same calculation. But, I can't seem to get the same results for alpha > 0

The minimal code to reproduce this.

import numpy as np
A = np.asmatrix(np.c_[np.ones((10,1)),np.random.rand(10,3)])
b = np.asmatrix(np.random.rand(10,1))
I = np.identity(A.shape[1])
alpha = 1
x = np.linalg.inv(A.T*A + alpha * I)*A.T*b
print(x.T)
>>> [[ 0.37371021  0.19558433  0.06065241  0.17030177]]

from sklearn.linear_model import Ridge
model = Ridge(alpha = alpha).fit(A[:,1:],b)
print(np.c_[model.intercept_, model.coef_])
>>> [[ 0.61241566  0.02727579 -0.06363385  0.05303027]]

Any suggestions on what I can do to resolve this discrepancy?

Upvotes: 7

Views: 4248

Answers (1)

JARS
JARS

Reputation: 1119

This modification seems to yield the same result for the direct version and the numpy version:

import numpy as np
A = np.asmatrix(np.random.rand(10,3))
b = np.asmatrix(np.random.rand(10,1))
I = np.identity(A.shape[1])
alpha = 1
x = np.linalg.inv(A.T*A + alpha * I)*A.T*b
print (x.T)


from sklearn.linear_model import Ridge
model = Ridge(alpha = alpha, tol=0.1, fit_intercept=False).fit(A ,b)

print model.coef_
print model.intercept_

It seems the main reason for the difference is the class Ridge has the parameter fit_intercept=True (by inheritance from class _BaseRidge) (source)

This is applying a data centering procedure before passing the matrices to the _solve_cholesky function.

Here's the line in ridge.py that does it

        X, y, X_mean, y_mean, X_std = self._center_data(
        X, y, self.fit_intercept, self.normalize, self.copy_X,
        sample_weight=sample_weight)

Also, it seems you were trying to implicitly account for the intercept by adding the column of 1's. As you see, this is not necessary if you specify fit_intercept=False

Appendix: Does the Ridge class actually implement the direct formula?

It depends on the choice of the solverparameter.

Effectively, if you do not specify the solverparameter in Ridge, it takes by default solver='auto' (which internally resorts to solver='cholesky'). This should be equivalent to the direct computation.

Rigorously, _solve_cholesky uses numpy.linalg.solve instead of numpy.inv. But it can be easily checked that

np.linalg.solve(A.T*A + alpha * I, A.T*b)

yields the same as

np.linalg.inv(A.T*A + alpha * I)*A.T*b

Upvotes: 6

Related Questions