KAICHENG WANG
KAICHENG WANG

Reputation: 61

How should I get the coefficients of Lasso Model?

Here is my code:

library(MASS)
library(caret)
df <- Boston
set.seed(3721)
cv.10.folds <- createFolds(df$medv, k = 10)
lasso_grid <- expand.grid(fraction=c(1,0.1,0.01,0.001))
lasso <- train(medv ~ ., 
               data = df, 
               preProcess = c("center", "scale"),
               method ='lasso',
               tuneGrid = lasso_grid,
               trControl= trainControl(method = "cv", 
                                       number = 10, 
                                       index = cv.10.folds))  

lasso

Unlike linear model, I cannot find the coefficients of Lasso regression model from summary(lasso). How should I do that? Or maybe I can use glmnet?

Upvotes: 6

Views: 8170

Answers (2)

Maurizio
Maurizio

Reputation: 1

I noticed there can be issues using the approach above, if one defines their own grid for hyperparameter tuning. Predict.enet appears to impose its own grid, which often does not correspond to the grid one defined for train().

If this is the case, one can set the "mode"-argument to "fraction" and provide a vector of fractions from the train()-output to the "s"-argument:

predict(lasso$finalModel, type = "coef", mode = "fraction", s = lasso$bestTune)

"S" can also be just your optimal tuning parameter, determined with train():

predict(lasso$finalModel, type = "coef", mode = "fraction", s = as.numeric(lasso$bestTune))

Created on 2020-09-11 by the reprex package (v0.3.0)

Upvotes: 0

StupidWolf
StupidWolf

Reputation: 46908

When you train with method="lasso", enet from elasticnet is called:

lasso$finalModel$call
elasticnet::enet(x = as.matrix(x), y = y, lambda = 0)

And the vignette writes:

The LARS-EN algorithm computes the complete elastic net solution simultaneously for ALL values of the shrinkage parameter in the same computational cost as a least squares fit

Under lasso$finalModel$beta.pure, you have coefficients for all 16 sets of coefficients corresponding to 16 values of L1 norm under lasso$finalModel$L1norm:

length(lasso$finalModel$L1norm)
[1] 16

dim(lasso$finalModel$beta.pure)
[1] 16 13

You can see it using predict too:

predict(lasso$finalModel,type="coef")
$s
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16

$fraction
 [1] 0.00000000 0.06666667 0.13333333 0.20000000 0.26666667 0.33333333
 [7] 0.40000000 0.46666667 0.53333333 0.60000000 0.66666667 0.73333333
[13] 0.80000000 0.86666667 0.93333333 1.00000000

$mode
[1] "step"

$coefficients
          crim        zn       indus      chas        nox       rm        age
0   0.00000000 0.0000000  0.00000000 0.0000000  0.0000000 0.000000 0.00000000
1   0.00000000 0.0000000  0.00000000 0.0000000  0.0000000 0.000000 0.00000000
2   0.00000000 0.0000000  0.00000000 0.0000000  0.0000000 1.677765 0.00000000
3   0.00000000 0.0000000  0.00000000 0.0000000  0.0000000 2.571071 0.00000000
4   0.00000000 0.0000000  0.00000000 0.0000000  0.0000000 2.716138 0.00000000
5   0.00000000 0.0000000  0.00000000 0.2586083  0.0000000 2.885615 0.00000000
6  -0.05232643 0.0000000  0.00000000 0.3543411  0.0000000 2.953605 0.00000000
7  -0.13286554 0.0000000  0.00000000 0.4095229  0.0000000 2.984026 0.00000000
8  -0.21665925 0.0000000  0.00000000 0.5196189 -0.5933941 3.003512 0.00000000
9  -0.32168140 0.3326103  0.00000000 0.6044308 -1.0246080 2.973693 0.00000000
10 -0.33568474 0.3771889 -0.02165730 0.6165190 -1.0728128 2.967696 0.00000000
11 -0.42820289 0.4522827 -0.09212253 0.6407298 -1.2474934 2.932427 0.00000000
12 -0.62605363 0.7005114  0.00000000 0.6574277 -1.5655601 2.832726 0.00000000
13 -0.88747102 1.0150162  0.00000000 0.6856705 -1.9476465 2.694820 0.00000000
14 -0.91679342 1.0613165  0.09956489 0.6837833 -2.0217269 2.684401 0.00000000
15 -0.92906457 1.0826390  0.14103943 0.6824144 -2.0587536 2.676877 0.01948534

The hyper-parameter tuned by caret is the fraction of the maximum L1 norm, so in the result you have provided, it will be 1, i.e the max :

lasso
The lasso 

506 samples
 13 predictor

Pre-processing: centered (13), scaled (13) 
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 51, 51, 51, 50, 51, 50, ... 
Resampling results across tuning parameters:

  fraction  RMSE      Rsquared   MAE     
  0.001     9.182599  0.5075081  6.646013
  0.010     9.022117  0.5075081  6.520153
  0.100     7.597607  0.5572499  5.402851
  1.000     6.158513  0.6033310  4.140362

RMSE was used to select the optimal model using the smallest value.
The final value used for the model was fraction = 1.

To get the coefficients out for the optimal fraction:

predict(lasso$finalModel,type="coef",s=16)
$s
[1] 16

$fraction
[1] 1

$mode
[1] "step"

$coefficients
       crim          zn       indus        chas         nox          rm 
-0.92906457  1.08263896  0.14103943  0.68241438 -2.05875361  2.67687661 
        age         dis         rad         tax     ptratio       black 
 0.01948534 -3.10711605  2.66485220 -2.07883689 -2.06264585  0.85010886 
      lstat 
-3.74733185

Upvotes: 2

Related Questions