user3490622
user3490622

Reputation: 1001

How to use cross_val_predict to predict probabilities for a new dataset?

I am using sklearn's cross_val_predict for training like so:

  myprobs_train = cross_val_predict(LogisticRegression(),X = x_old, y=y_old, method='predict_proba', cv=10)

I am happy with the returned probabilities, and would like now to score up a brand-new dataset. I tried:

  myprobs_test = cross_val_predict(LogisticRegression(), X =x_new, y= None, method='predict_proba',cv=10)

but this did not work, it's complaining about y having zero shape. Does it mean there's no way to apply the trained and cross-validated model from cross_val_predict on new data? Or am I just using it wrong?

Thank you!

Upvotes: 2

Views: 1974

Answers (2)

udothemath
udothemath

Reputation: 137

I have the same concern as @user3490622. If we can only use cross_val_predict on training and testing sets, why y (target) is None as the default value? (sklearn page)

To partially achieve the desired results of multiple predicted probability, one could use the fit then predict approach repeatedly to mimic the cross-validation.

Upvotes: 0

user6655984
user6655984

Reputation:

You are looking at a wrong method. Cross validation methods do not return a trained model; they return values that evaluate the performance of a model (logistic regression in your case). Your goal is to fit some data and then generate prediction for new data. The relevant methods are fit and predict of the LogisticRegression class. Here is the basic structure:

logreg = linear_model.LogisticRegression()
logreg.fit(x_old, y_old)
predictions = logreg.predict(x_new)

Upvotes: 6

Related Questions