Reputation: 51
I have a query about the cv.glmnet() function in R which is supposed to find the "optimum" value of the parameter lambda for ridge regression.
In the example code below, if you experiment a bit with values of lambda that are smaller than the one that cv.glmnet() gives, you will find that the error sum of squares actually is much smaller than what cv.fit$lambda.min gives.
I have noticed this with many datasets. Even the example in the well known book "Introduction to Statistical Learning", (ISLR) by Gareth James et al has this problem. (Section 6.6.1 using the Hitters dataset). The actual value of lambda that minimizes the MSE is smaller than what the ISLR book gives. This is true both on the train data as well as new test data.
What is the reason for this? So, what exactly is cv.fit$lambda.min returning?
Ravi
data(mtcars)
y = mtcars$hp
X = model.matrix(hp~mpg+wt+drat, data=mtcars)[ ,-1]
X
lambdas = 10^seq(3, -2, by=-.1)
fit = glmnet(X, y, alpha=0, lambda=lambdas)
summary(fit)
cv.fit = cv.glmnet(X, y, alpha=0, lambda=lambdas)
# what is the optimum value of lambda?
(opt.lambda = cv.fit$lambda.min) # 1.995262
y.pred = predict(fit, s=0.01, newx=X, exact=T) # gives lower SSE
# Sum of Squares Error
(sse = sum((y.pred - y)^2))
Upvotes: 0
Views: 918
Reputation: 1860
cv.glmnet
searches for lambda
minimizing cross-validation score, not MSE.
From ?cv.glmnet
:
The function runs
glmnet
nfolds
+1 times; the first to get thelambda
sequence, and then the remainder to compute the fit with each of the folds omitted. The error is accumulated, and the average error and standard deviation over the folds is computed.
Upvotes: 1