Reputation: 6055
I was reading the glmnet documentation and I found this:
Note also that the results of cv.glmnet are random, since the folds are selected at random. Users can reduce this randomness by running cv.glmnet many times, and averaging the error curves.
The following code uses caret with a repeated cv.
library(caret)
ctrl <- trainControl(verboseIter = TRUE, classProbs = TRUE,
summaryFunction = twoClassSummary, method = "repeatedcv",
repeats = 10)
fit <- train(x, y, method = "glmnet", metric = "ROC", trControl = ctrl)
Is that the best way to run glmnet with cross validation through caret?, or is it better to run glmnet directly?
Upvotes: 4
Views: 2813
Reputation: 11
You need to define best way. Do you want to use
A regularized regression alone on a dataset for feature selection? (in which case, use glmnet--Max Kuhn has implied that you may be better off using models with in-built CV features as they would have been optimized for both predictor selection and minimizing error). See below.
"In many cases, using these models with built-in feature selection will be more efficient than algorithms where the search routine for the right predictors is external to the model. Built-in feature selection typically couples the predictor search algorithm with the parameter estimation and are usually optimized with a single objective function (e.g. error rates or likelihood)." (Kuhn, caret package documentation: caret feature selection overview)
Or are you comparing different models, one of which is glmnet? In which case, caret may be a great choice.
Upvotes: 1