txizzle
txizzle

Reputation: 840

Sklearn Lasso Regression is orders of magnitude worse than Ridge Regression?

I've currently implemented Ridge and Lasso regression using the sklearn.linear_model module.

However, the Lasso Regression seems to do 3 orders of magnitude worse on the same dataset!

I'm not sure what's wrong, because mathematically, this shouldn't be happening. Here's my code:

def ridge_regression(X_train, Y_train, X_test, Y_test, model_alpha):
    clf = linear_model.Ridge(model_alpha)
    clf.fit(X_train, Y_train)
    predictions = clf.predict(X_test)
    loss = np.sum((predictions - Y_test)**2)
    return loss

def lasso_regression(X_train, Y_train, X_test, Y_test, model_alpha):
    clf = linear_model.Lasso(model_alpha)
    clf.fit(X_train, Y_train)
    predictions = clf.predict(X_test)
    loss = np.sum((predictions - Y_test)**2)
    return loss


X_train, X_test, Y_train, Y_test = cross_validation.train_test_split(X, Y, test_size=0.1, random_state=0)
for alpha in [0, 0.01, 0.1, 0.5, 1, 2, 5, 10, 100, 1000, 10000]:
    print("Lasso loss for alpha=" + str(alpha) +": " + str(lasso_regression(X_train, Y_train, X_test, Y_test, alpha)))

for alpha in [1, 1.25, 1.5, 1.75, 2, 5, 10, 100, 1000, 10000, 100000, 1000000]:
    print("Ridge loss for alpha=" + str(alpha) +": " + str(ridge_regression(X_train, Y_train, X_test, Y_test, alpha)))

And here's my output:

Lasso loss for alpha=0: 20575.7121727
Lasso loss for alpha=0.01: 19762.8763969
Lasso loss for alpha=0.1: 17656.9926418
Lasso loss for alpha=0.5: 15699.2014387
Lasso loss for alpha=1: 15619.9772649
Lasso loss for alpha=2: 15490.0433166
Lasso loss for alpha=5: 15328.4303197
Lasso loss for alpha=10: 15328.4303197
Lasso loss for alpha=100: 15328.4303197
Lasso loss for alpha=1000: 15328.4303197
Lasso loss for alpha=10000: 15328.4303197
Ridge loss for alpha=1: 61.6235890425
Ridge loss for alpha=1.25: 61.6360790934
Ridge loss for alpha=1.5: 61.6496312133
Ridge loss for alpha=1.75: 61.6636076713
Ridge loss for alpha=2: 61.6776331539
Ridge loss for alpha=5: 61.8206621527
Ridge loss for alpha=10: 61.9883144732
Ridge loss for alpha=100: 63.9106882674
Ridge loss for alpha=1000: 69.3266510866
Ridge loss for alpha=10000: 82.0056669678
Ridge loss for alpha=100000: 88.4479064159
Ridge loss for alpha=1000000: 91.7235727543

Any idea why?

Thanks!

Upvotes: 5

Views: 1926

Answers (1)

DevShark
DevShark

Reputation: 9112

Interesting problem. I can confirm that it's not an issue with the implementation of the algorithm, but the correct response to your input.

Here's a thought: you are not normalizing the data I believe from your description. This can lead to instability, as your features have significantly different orders of magnitude and variance. Lasso is more "all or nothing" than ridge (you've probably noticed it chooses many more 0 coefficients than ridge), so that instability is magnified.

Try to normalize your data, and see if you like your results better.

Another thought: that might be intentional from the Berkeley teachers, to highlight the fundamentally different behavior between ridge and lasso.

Upvotes: 4

Related Questions