Reputation: 27
Here is some basic code generating a regression problem.
import numpy as np
import pandas as pd
from sklearn.linear_model import Lasso, LinearRegression
n = 100
np.random.seed(42)
X = np.random.normal(size=n)
eps = np.random.normal(size=n)
b_0, b_1, b_2, b_3 = 0.5, 2.8, 6.7, 3.4
Y = b_0 + b_1 * X + b_2 * (X ** 2) + b_3 * (X ** 3) + eps
n_cols = 10
df_X = pd.DataFrame(columns=[f'X ^ {i}' for i in range(1, n_cols + 1)])
for i in range(1, n_cols + 1):
df_X[f'X ^ {i}'] = X ** i
I was looking at how Lasso
shrinks the coefficients when I noticed something odd. The coefficient values for X variables that are raised to powers greater than 3 (variables redundant to the real problem, but that increase variance of the model) have coefficients of almost 0 for alpha/lambda equal to 0. I've checked what are the coefficients for OLS and they are very non-zero :
betas OLS | betas Lasso | |
---|---|---|
const | 0.3 | 0.3 |
X^1 | 2.2 | 2.5 |
X^2 | 5.8 | 6.9 |
X^3 | 5.3 | 3.7 |
X^4 | 3.0 | -0.03 |
X^5 | -2.0 | -0.14 |
X^6 | -2.4 | 0 |
X^7 | 0.8 | 0 |
X^8 | 0.7 | 0 |
X^9 | -0.1 | 0 |
X^10 | -0.07 | 0 |
Theoretically Lasso
with alpha/lambda = 0 should produce the same results as OLS. I've found that here a similar question was asked and I understand that as the documentation for Lasso
states:
When alpha = 0, the objective is equivalent to ordinary least squares, solved by the LinearRegression object. For numerical reasons, using alpha = 0 with the Lasso object is not advised. Instead, you should use the LinearRegression object.
Ok, sure, makes sense, numerical issues. But why are the results so different even when alpha is not equal to zero, but for example 0.00001, or 0.001, or 0.1?
If you look at the plot of shrinking coefficients I've produced, you will notice, that the initial high values of coefficients for X^4/5/6/7/8 are completely omitted in what Lasso
produces. Or maybe they begin to come into picture at lasso~=0.5, but that is also when they start being shrunk.
Code for generating the plot:
lambdas = np.concatenate((np.linspace(0, 1 - 1e-8, 1000), np.linspace(1, 200, 1000)))
coefs_path = []
for l in lambdas:
model = Lasso(alpha=l)
model.fit(df_X, Y)
coefs_path.append(model.coef_)
coefs_path = np.array(coefs_path)
for i_var in range(df_X.shape[1]):
plt.plot(lambdas, coefs_path[:, i_var], label=f'$X^{{{i_var+1}}}$')
plt.title('Shrinking coefficients')
plt.xlabel('$\lambda$')
plt.xlim(left=-1, right=10)
# I have a different part of code where I use sklearns' LassoCV, but that is irrelevant here.
lasso_cv_alpha_ = 0.28
plt.axvline(lasso_cv_alpha_, lw=2, linestyle='--', color='#49eb34', label=f'Best coefficients ($\lambda$ = {lasso_cv_alpha_:.3f})')
plt.legend();
Is this an expected behaviour? Because it doesn't feel like it is.
Edit based on Jordi Pastor comments:
Indeed changing the Lasso model parameters to Lasso(alpha=l, max_iter=int(1e6), tol=1e-16)
produces non-zero coefficients for X^4/5/6/7/8/9/10 variables. They are still way off compared to OLS as presented below
betas OLS | betas Lasso (tol & max_iter) | |
---|---|---|
const | 0.3 | 0.3 |
X^1 | 2.2 | 2.7 |
X^2 | 5.8 | 6.0 |
X^3 | 5.3 | 2.8 |
X^4 | 3.0 | 2.1 |
X^5 | -2.0 | 1.0 |
X^6 | -2.4 | -1.3 |
X^7 | 0.8 | -0.5 |
X^8 | 0.7 | 0.2 |
X^9 | -0.1 | 0.07 |
X^10 | -0.07 | 0 |
Here is an updated plot of the shrinking coefficients with the changed model parameters. I've changed the range on the x-axis for better visibility of the small alpha/lambda values. Again the results are odd to me. The coefficients start at the values that are above in the betas Lasso column, but they drop almost immediately to zero. To be clear, the max_iter and tol parameters were set for all model runs. To be honest now this is even more odd to me.
Upvotes: 0
Views: 188