Reputation: 41
Glmnet with ridge regularization calculates coefficients for the first lambda value differently when lambda vector is chosen by glmnet algorithm compared to when it is given in a function call. For example, two models (that I would expect to be identical)
> m <- glmnet(rbind(c(1, 0), c(0, 1)), c(1, 0), alpha=0)
> m2 <- glmnet(rbind(c(1, 0), c(0, 1)), c(1, 0), alpha=0, lambda=m$lambda)
give completely different coefficients:
> coef(m, s=m$lambda[1])
3 x 1 sparse Matrix of class "dgCMatrix"
1
(Intercept) 5.000000e-01
V1 1.010101e-36
V2 -1.010101e-36
> coef(m2, s=m2$lambda[1])
3 x 1 sparse Matrix of class "dgCMatrix"
1
(Intercept) 0.500000000
V1 0.000998004
V2 -0.000998004
The same happens with different datasets too. When lambda is not provided for glmnet, all coefficients for lambda.max coef(m, s=m$lambda[1]) (except for the intercept) are very close to zero and predictions are equal for any X (due to rounding?).
My questions:
Upvotes: 4
Views: 793
Reputation: 231
This is a tricky one. When alpha=0, the "starting" value of lambda (value when all coefficients except intercept are zero) is infinity. Since we want to produce a grid of values that go to zero geometrically from the starting value, infinity was not much use. So we made it the starting value that would be used when alpha=0.001 (In this case 500), which is the largest lambda seen.
So, in m, the coefficients are really zero, but the largest lambda reported is 500 (meanwhile it really was infinity)
In m2, we actually produce the fit at 500 for the first position, and the coefficients are not quite zero.
To verify what I have said, notice that the subsequent coefficients all match.
Trevor Hastie
Upvotes: 6