Reputation: 31
When using 5-fold cross-validation to create a model, 5 different models are created. The selection of the final model can vary:
I understand that cross-validation is used for model checking, not for model building. So when a predict_proba
is used on the model, how is this probability defined? Could you share some papers or articles that talk about how the prediction works on caret in R and in sklearn
in Python with cross-validation?
Upvotes: 0
Views: 441
Reputation: 8152
The docs for sklearn.model_selection.cross_val_predict
make it clear that you can specify the prediction method with the method
argument, e.g. method='predict_proba'
.
If you do this, it simply calls that method internally instead of predict
. The result is an estimate of the target made when each fold was the validation set during cross-validation.
For what it's worth, I would not select the model from the best-scoring CV fold. Train the final model on all your data.
Upvotes: 0