Reputation: 2273
I would like to compare my elastic net model to a unregularized model. For the sake of fairness and simplicity, I would like to train both models using the glmnet
package. However, I recently discovered that glmnet
overrides lambda=0
on some datasets.
How can I force glmnet
to behave like glm
?
x <- structure(c(0.028, 0.023, 0.0077, 0.14, 0.027, 0.084, 0.018,
0.055, 0.0089, 0.016, 0.037, 0.043, 0.046, 0.031, 0.034, 0.056,
0.016, 0.048, 0.013, 0.02, 0.067, 0.046, 0.058, 0.054, 0.036,
0.043, 0.009, 0.12, 0.024, 0.018, 0.066, 0.046, 0.057, 0.054,
0.036, 0.043, 0.009, 0.12, 0.024, 0.018, 0.051, 0.043, 0.047,
0.045, 0.034, 0.04, 0.009, 0.085, 0.022, 0.016, 0.028, 0.023,
0.0089, 0.14, 0.028, 0.084, 0.02, 0.055, 0.0089, 0.016, 0.067,
0.049, 0.058, 0.055, 0.038, 0.043, 0.009, 0.12, 0.024, 0.018,
0.067, 0.046, 0.058, 0.054, 0.036, 0.043, 0.009, 0.12, 0.024,
0.018), .Dim = c(10L, 8L), .Dimnames = list(NULL, NULL))
y <- gl(2, 5)
fit <- glmnet::glmnet(x, y, family = "binomial", lambda = 0)
fit$lambda # should be 0 but actually infinity
Warning messages:
1: In lognet(xd, is.sparse, ix, jx, y, weights, offset, alpha, nobs, :
one multinomial or binomial class has fewer than 8 observations; dangerous ground
2: from glmnet Fortran code (error code -1); Convergence for 1th lambda value not reached after maxit=100000 iterations; solutions for larger lambdas returned
3: In getcoef(fit, nvars, nx, vnames) :
an empty model has been returned; probably a convergence issue
Upvotes: 0
Views: 324
Reputation: 46978
In your example, some of the predictors are highly correlated, for example between 3,4,7,8, it's essentially the same values:
round(cor(x),3)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 1.000 0.308 0.290 0.294 0.330 1.000 0.292 0.290
[2,] 0.308 1.000 0.653 0.652 0.730 0.306 0.660 0.653
[3,] 0.290 0.653 1.000 1.000 0.989 0.285 0.999 1.000
[4,] 0.294 0.652 1.000 1.000 0.989 0.290 0.999 1.000
[5,] 0.330 0.730 0.989 0.989 1.000 0.325 0.992 0.989
[6,] 1.000 0.306 0.285 0.290 0.325 1.000 0.287 0.285
[7,] 0.292 0.660 0.999 0.999 0.992 0.287 1.000 0.999
[8,] 0.290 0.653 1.000 1.000 0.989 0.285 0.999 1.000
One of the purpose of a penalized regression is to drop correlated variables and improve the regression. From glmnet vigette:
It is known that the ridge penalty shrinks the coefficients of correlated predictors towards each other while the lasso tends to pick one of them and discard the others. The elastic net penalty mixes these two: if predictors are correlated in groups, an š¼=0.5 tends to either select or leave out the entire group of features. This is a higher level parameter, and users might pick a value upfront or experiment with a few different values. One use of š¼ is for numerical stability; for example, the elastic net with š¼=1āš for some small š>0 performs much like the lasso, but removes any degeneracies and wild behavior caused by extreme correlations.
If you set lambda to be 0, it's going to be very hard to reach convergence, because how do you determine the coefficient for highly correlated variables? It is touched upon in this post you can set thres
to a small value but I doubt it will work in your case, because the correlations are too high.
To prove this, let's change the x to a random normal matrix and you can see it converges:
fit = glmnet::glmnet(matrix(rnorm(80),ncol=8), y, family = "binomial", lambda = 0)
Call: glmnet::glmnet(x = matrix(rnorm(80), ncol = 8), y = y, family = "binomial", lambda = 0)
Df %Dev Lambda
1 8 100 0
In summary, I would think about the purpose of comparison. If you have a lot of predictors, many correlated, it is obvious that an unregularized regression would perform poorly, stating the case for something like glmnet
. You can just use glm
to prove your point. Otherwise consider removing highly correlated predictors.
Upvotes: 2