How to use cross_val_predict to predict probabilities for a new dataset?

Question

I am using sklearn's cross_val_predict for training like so:

  myprobs_train = cross_val_predict(LogisticRegression(),X = x_old, y=y_old, method='predict_proba', cv=10)

I am happy with the returned probabilities, and would like now to score up a brand-new dataset. I tried:

  myprobs_test = cross_val_predict(LogisticRegression(), X =x_new, y= None, method='predict_proba',cv=10)

but this did not work, it's complaining about y having zero shape. Does it mean there's no way to apply the trained and cross-validated model from cross_val_predict on new data? Or am I just using it wrong?

Thank you!

user6655984 · Accepted Answer

You are looking at a wrong method. Cross validation methods do not return a trained model; they return values that evaluate the performance of a model (logistic regression in your case). Your goal is to fit some data and then generate prediction for new data. The relevant methods are fit and predict of the LogisticRegression class. Here is the basic structure:

logreg = linear_model.LogisticRegression()
logreg.fit(x_old, y_old)
predictions = logreg.predict(x_new)

How to use cross_val_predict to predict probabilities for a new dataset?

Answers (2)

Related Questions