Nile
Nile

Reputation: 333

Extract the fitted values, residuals and the summary statistics from a cv.glmnet

I wonder how I can extract the fitted values, residuals and the summary statistics from a cv.glmnet object for a specific lambda (e.g. "lambda.1se"). Assume only I have access to the cv.glmnet object and not the training data directly. Here is an example:

data(iris)
Inx <- sample(nrow(iris),100)
iris$Species <- factor(iris$Species)

train_data <- iris[Inx, ]
test_data <- iris[-Inx,]

Formula <- "Sepal.Length ~ Sepal.Width + Petal.Length + Species:Petal.Width + 
            Sepal.Width:Petal.Length +  Species +  splines::bs(Petal.Width, df = 2, 
            degree = 2)"

------- glm cross validation --------------

ModelMatrix <- predict(caret::dummyVars(Formula, train_data, fullRank = T,  
                       sep = ""), train_data)
cvglm <- glmnet::cv.glmnet(x = ModelMatrix,
                           y = train_data$Sepal.Length, nfolds = 4, keep = TRUE,  
                           alpha = 1, parallel = F, type.measure = 'mse') 

Thank you for any suggestion.

Upvotes: 1

Views: 4677

Answers (3)

StupidWolf
StupidWolf

Reputation: 46908

I understand your question now, you set keep =TRUE and expect to find the fitted values. So let's run this:

cvglm <- glmnet::cv.glmnet(x = ModelMatrix,
                           y = train_data$Sepal.Length, nfolds = 4, keep = TRUE,  
                           alpha = 1, parallel = F, type.measure = 'mse') 

There are fitted values are stored under:

dim(cvglm$fit.preval)
[1] 100 100

It's for 1 column for every lambda value. However these are

a prevalidated array is returned containing fitted values for each observation and each value of ‘lambda’. This means these fits are computed with this observation and the rest of its fold omitted

During CV, these values are the predicted values on the un-fitted fold, and used for calculating the mse. So for example, we can get back the cross-validated mse for the 100th value of lambda tested:

mean((cvglm$fit.preval[,100] - train_data$Sepal.Length)^2)
[1] 0.1072564

cvglm$cvm[100]
[1] 0.1072564

For the usual summary statistics you get from summary(lm()) , you need to work with what is under cvglm$glmnet, but it does not come with a handy function. It would be what @ColinH pointed out in the answer, that is to calculate them somewhat manually.. The training data is not stored in the object.

Upvotes: 2

Colin H
Colin H

Reputation: 660

The cv.glmnet object does not directly save the fitted values or the residuals. Assuming you have at least some sort of test or validation matrix (test_df convertible to test_matrix) you can calculate both fitted values and residuals. The s argument to the predict function allows one to access the betas for a particular lambda.

fitted_values <- predict(cvglm, test_matrix, s = 'lambda.1se')

residuals <- test_df$actual_values - fitted_values

For summary statistics, you probably want to access the cvglm$cvm parameter. This is the cross validation measure of error used to decide which lambda produces the best model. In your original post, the type.measure parameter is set to mse therefore this vector will be the MSEs produced for each level of lambda.

Here's how to find the value for lambda.1se:

cvglm$cvm[cvglm$lambda == cvglm$lambda.1se]

From your other comments, it sounds like you may want the deviance ratio, which is accessible in the cvglm$glmnet.fit$dev.ratio vector.

cvglm$glmnet.fit$dev.ratio[cvglm$lambda == cvglm$lambda.1se]

From the docs on the dev.ratio:

The fraction of (null) deviance explained (for"elnet", this is the R-square).The deviance calculations incorporate weights if present in the model. The de-viance is defined to be 2*(loglike_sat - loglike), where loglike_sat is the log-likelihood for the saturated model (a model with a free parameter per observa-tion). Hence dev.ratio=1-dev/nulldev.

Upvotes: 2

Sathish
Sathish

Reputation: 12713

Fitted values:

cvglm$glmnet.fit

Call:  glmnet(x = ModelMatrix, y = train_data$Sepal.Length, parallel = F,      alpha = 1) 

      Df   %Dev    Lambda
 [1,]  0 0.0000 7.047e-01
 [2,]  1 0.1341 6.421e-01
 [3,]  1 0.2454 5.851e-01
 [4,]  1 0.3378 5.331e-01
 [5,]  1 0.4146 4.857e-01
 [6,]  1 0.4783 4.426e-01
 ...omitted

Summary:

do.call( 'cbind', list(lambda = cvglm$lambda, mean = cvglm$cvm, sd = cvglm$cvsd, cvup = cvglm$cvup, cvlo = cvglm$cvlo))       

            lambda      mean          sd      cvup       cvlo
 [1,] 0.7047388085 0.6245721 0.049917703 0.6744898 0.57465442
 [2,] 0.6421317551 0.5526067 0.058082258 0.6106889 0.49452443
 [3,] 0.5850865398 0.4815325 0.053453517 0.5349860 0.42807898
 [4,] 0.5331090641 0.4225556 0.049193377 0.4717490 0.37336222
 [5,] 0.4857491242 0.3736194 0.045278477 0.4188979 0.32834093
 ...omitted

Upvotes: 1

Related Questions