user1995
user1995

Reputation: 538

Can I obtain a glm object from the output of glmnet?

I have used glmnet to get a ridge regressed (i.e. L2 normalized) logistic model -

ridge.model <- glmnet(x, y, alpha = 0, family = "binomial", lambda = bestlam)

Now, I want to find its test error rate using 10 fold Cross Validation. This can be done using cv.glm as such -

fit_10CV<- glm(good ~ ., family = binomial, data = winedata)
fit_10CV.cv.err =cv.glm(winedata ,fit_10CV, cost1, K = 10) # 10 fold CV

But it requires a glm object (fit_10CV in this case).

But the output of glmnet is a glmnet object, which cv.glm cannot take. I feel I am so close yet too far, because the glmnet is giving me the logistic regression model I need, but its not giving me in a form (i.e. as a glm object) that I can plug into cv.glm to get 10 fold CV test error.

Any help will be appreciated!

Upvotes: 0

Views: 838

Answers (2)

StupidWolf
StupidWolf

Reputation: 47008

There are two ways you can do this:

url="https://raw.githubusercontent.com/stedy/Machine-Learning-with-R-datasets/master/winequality-white.csv"
winedata = read.csv(url)
winedata$good = factor(ifelse(winedata$quality>6,1,0))
winedata$quality = NULL

First we run cv.glmnet, it only retains the mean se:

library(caret)
library(glmnet)

x = model.matrix(good ~ ., family = binomial, data = winedata)
cv_glmnet = cv.glmnet(x, winedata$good, family = "binomial", type.measure = "class",alpha=0,nfolds = 10)

We collect the lambda tested and we use caret to do the cv:

tr = trainControl(method="cv",number=10)
trGrid = data.frame(lambda=cv_glmnet$lambda,alpha=0)
cv_caret = train(good ~ .,data=winedata,
trControl = tr,tuneGrid=trGrid,family="binomial",method="glmnet")

In caret they measure Accuracy and 1 - Accuracy is the misclassification error you have from cv.glmnet. We put them together and you can see they are very similar

library(ggplot2)
df = rbind(
data.frame(lambda=cv_glmnet$lambda,mean_error=cv_glmnet$cvm,method="cv.glmnet"),
data.frame(lambda=cv_caret$results$lambda,
mean_error=1-cv_caret$results$Accuracy,method="cv.caret")
)

ggplot(df,aes(x=log(lambda),y=mean_error,col=method)) + geom_point() +facet_wrap(~method) + theme_bw()

enter image description here

You can get the errors from individual resample, for the best model with:

cv_caret$resample
    Accuracy     Kappa Resample
1  0.7975460 0.1987720   Fold09
2  0.8163265 0.3127844   Fold01
3  0.7918367 0.1631505   Fold02
4  0.8057260 0.2311448   Fold05
5  0.8061224 0.2777347   Fold04
6  0.7979592 0.1989498   Fold07
7  0.8081633 0.2759683   Fold10
8  0.8183673 0.3053742   Fold08
9  0.8102041 0.2474983   Fold03
10 0.7979592 0.1989498   Fold06

Upvotes: 1

RyanFrost
RyanFrost

Reputation: 1428

The glmnet package provides a function cv.glmnet that performs cross-validation for glmnet objects. Here's the documentation.

You can specify a range of lambda values to search over (or you can let cv.glmnet pick those values). Then, cv.glmnet will calculate the cross-validation error over all of those lambdas. You can then pick out the cv-error for your preferred lambda choice.

For your variables it would look like:

cvfit <- cv.glmnet(x, y, alpha = 0, family = "binomial", 
                   nfolds = 10, lambda = your_lambda_vector)

lambda_min_ind <- which(cvfit$lambda == cvfit$lambda.min)
cverr_min <- cvfit$cvm[lambda_min_ind]

Upvotes: 2

Related Questions