guyguyguy12345
guyguyguy12345

Reputation: 571

How to extract actual classification error rate with cost function from cv.glmnet, so I can compare with cv.glm?

The cvm of cv.glmnet for binomial regression is actually the binomial deviance. How can I extract the cross validated error rate of classification for cv.glmnet object? I need it to compare to cross validated error rate from cv.glm.

Upvotes: 2

Views: 3382

Answers (2)

Simon Ji
Simon Ji

Reputation: 759

Another approach is do:

cv.glmnet(x2.2, y2, alpha=1, family="binomial", type.measure = "class")

Upvotes: 1

lrnzcig
lrnzcig

Reputation: 3947

cv.glmnet is providing the binomial deviance while cv.glm is providing classification error. To be able to compare, you would need to predict the output class of cv.glmnet and take the mean of classification errors:

cv2.2.lasso=cv.glmnet(x2.2, y2, alpha=1, family="binomial")
mean(predict(cv2.2.lasso, x2.2, s=cv2.2.lasso$lambda.1se, type="class") != y2)

However, with the code above you would be calculating the classification error of the fitted model with all the data, but not cross-validation classification error. If you are not overfitting the values should be closed enough, at least in orders of magnitude, but not really comparable. If you really need to compare the two, you should run the cross-validation loop yourself, could be something like this:

errors <- vector(mode="list", number_of_folds)
rand <- floor(runif(dim(input_data)[1], min=0, max=number_of_folds))

for (fold in 0:(number_of_folds-1)) {
  print(paste("fold", fold))

  folds.x <- model.matrix(formula, data=input_data)
  folds.x.train <- folds.x[rand != fold,]
  folds.x.test <- folds.x[rand == fold,]
  folds.y.train <- input_data[rand != fold, results_column_name]
  folds.y.test <- input_data[rand == fold, results_column_name]

  folds.fit <- glmnet(folds.x.train, folds.y.train, alpha=1, family="binomial")
  folds.fit.test <- predict(folds.fit, folds.x.test, type="class")
  errors[[step+1]] <- apply(folds.fit.test != y2, 2, sum)
}

Where each element of the list errors contains, per fold, the sum of number of errors for each value of lambda. Then, per lambda, you need to calculate the mean, and then choose the lambda to compare to the other model.

Hope it helps.

Upvotes: 3

Related Questions