Har
Har

Reputation: 31

How does predict_proba work with cross-validation?

When using 5-fold cross-validation to create a model, 5 different models are created. The selection of the final model can vary:

  1. best-estimated (or other criteria) model out of the 5-fold created model or
  2. the model create when trained on all the datasets.

I understand that cross-validation is used for model checking, not for model building. So when a predict_proba is used on the model, how is this probability defined? Could you share some papers or articles that talk about how the prediction works on caret in R and in sklearn in Python with cross-validation?

Upvotes: 0

Views: 441

Answers (1)

Matt Hall
Matt Hall

Reputation: 8152

The docs for sklearn.model_selection.cross_val_predict make it clear that you can specify the prediction method with the method argument, e.g. method='predict_proba'.

If you do this, it simply calls that method internally instead of predict. The result is an estimate of the target made when each fold was the validation set during cross-validation.

For what it's worth, I would not select the model from the best-scoring CV fold. Train the final model on all your data.

Upvotes: 0

Related Questions