David
David

Reputation: 481

Don't get the correct lasso MSE

I'm new with regressions and I wrote a very simple code in matlab that uses lasso function just to see if I understand how lasso's MSE is calculated. But I get mse that is different than lasso's output. I'm probably missing something and I'd appreciate if anyone here can tell me where I'm wrong. For calculating the MSE I used the following formula from this link: https://www.mathworks.com/help/stats/lasso.html enter image description here

And here's the matlab code that I wrote:

clear;
close all;
clc;

% Checking lasso MSE from this link:
% https://www.mathworks.com/help/stats/lasso.html

n = 10;
p = 3;
X = 20*rand(n,p);
min_val = -20;
max_val = 20;  
y = min_val + (max_val - min_val)*rand(n,1);

lambda_vals = [0.2, 0.8, 1, 1.5];
[beta_vectors , FitInfo] = lasso(X, y, 'Lambda', lambda_vals);

eps = 10^-10;
num_of_lambda_vals = length(lambda_vals);
for i=1:num_of_lambda_vals 
    current_calculated_mse = sum((y - FitInfo.Intercept(i) - X*beta_vectors(:,i)).^2)/(2*n) +...
        lambda_vals(i)*sum(abs(beta_vectors(:,i)));    
    current_mse = FitInfo.MSE(i);

    fprintf('current_calculated_mse = %f\n',current_calculated_mse);
    fprintf('current_mse = %f\n',current_mse);
    sqr_diff_mses = (current_calculated_mse-current_mse)^2;
    if (sqr_diff_mses > eps)
        fprintf('The calculated MSE is wrong!\n');
    end
    fprintf('\n');
end

If you run the code it will print that the calculate MSE is wrong. Can anybody tell what is wrong with my code?

Thanks

Upvotes: 1

Views: 426

Answers (1)

Ander Biguri
Ander Biguri

Reputation: 35525

You are just using the wrong equation

When you want to propose a minimization problem, then you add regularization and other terms to a function to minimize, and in your case, it's the equation you shared.

However, when you want to validate your results, you are only interested in knowing how different your solution applied to the model compared to the real data. This means that when you compute the error (MSE in this case), you just need:

enter image description here

where your solution applied to the model is

enter image description here

In short: Change current_calculated_mse to

current_calculated_mse = sum((y - FitInfo.Intercept(i) - X*beta_vectors(:,i)).^2)/(n);    

Outputs:

current_calculated_mse = 116.748997
current_mse = 116.748997

current_calculated_mse = 122.421290
current_mse = 122.421290

current_calculated_mse = 125.824726
current_mse = 125.824726

current_calculated_mse = 137.641287
current_mse = 137.641287

Why not use the minimizing equation for the error? It makes sense if we are minimizing that!

Yes! And no. You propose a minimizing equation to steer a solution with some desired properties, in the case of Lasso, you want as many of the beta values to be zero. But that does not mean that your solution is any good.

You could as well have minimized an equation that said argmin ( beta*x-y )*0+(1-beta). This equation, solved, will have beta=1 as a perfect minimizer, but does that mean that your solution is perfect? No, not at all! You just chose a bad fucntion to minimize. You want the beta that fits the real data (y) better. In your case is the same, as you are using a varying set of lambdas. You can see in the solutions that your big lambdas solve the equation too, but the solutions fit the real data worse. You are doing this to chose the best lambda.

Upvotes: 2

Related Questions