Doppler
Doppler

Reputation: 65

Type measure differences in glmnet package?

What is the difference between using "mse" and "class" in the glmnet package?

log_x <- model.matrix(response~.,train)
log_y <- ifelse(train$response=="good",1,0)
log_cv <- cv.glmnet(log_x,log_y,alpha=1,family="binomial", type.measure =  "class")
summary(log_cv)
plot(log_cv)

vs.

log_x <- model.matrix(response~.,train)
log_y <- ifelse(train$response=="good",1,0)
log_cv <- cv.glmnet(log_x,log_y,alpha=1,family="binomial", type.measure =  "mse")
summary(log_cv)
plot(log_cv)

I'm noticing that I'm getting a slightly different curve, or smootness in my plot, and a few % difference in accuracy. But for predicting a binnomial class response is one type measure more appropriate than the other?

Upvotes: 4

Views: 3629

Answers (2)

izmirlig
izmirlig

Reputation: 9

For the binomial family the default in cv.glmnet is not the misclassification error but the deviance.

Secondly, you would never use the mean squared error loss for the binomial family.

Appropriate type meausres for the binomial family in cv.glmnet are the "default"(deviance), "auc" (the auc) and "class" (the misclassification error). Thats it.

Upvotes: 0

Felipe Alvarenga
Felipe Alvarenga

Reputation: 2652

It depends on your case study and what you want to learn from your model. From the help files

The default is type.measure="deviance", which uses squared-error for gaussian models (a.k.a type.measure="mse" there) [...]. type.measure="class" applies to binomial and multinomial logistic regression only, and gives misclassification error

Therefore, you have to ask yourself whether, in your problem, you want to minimize misclassification error or the mean squared error.

There is no straight forward answer to which is best. They are two different statistics from which the model decides what is the best penalization parameter to go for given the different models generated by the cross validation.

Upvotes: 2

Related Questions