Reputation: 1557
In the sklearn.model_selection.cross_val_predict
page it is stated:
Generate cross-validated estimates for each input data point. It is not appropriate to pass these predictions into an evaluation metric.
Can someone explain what does it mean? If this gives estimate of Y (y prediction) for every Y (true Y), why can't I calculate metrics such as RMSE or coefficient of determination using these results?
Upvotes: 1
Views: 2360
Reputation: 16
Just to add a little more clarity, it is easier to understand the difference if you consider a non-linear scoring function such as Maximum-Absolute-Error instead of something like a mean-absolute error.
cross_val_score() would compute the maximum-absolute-error on each off the 3-folds (assuming 3 fold cross-validator) and report the aggregate (say mean?) over 3 such scores. That is, something like mean of (a, b, c) where a , b, c are the max-abs-errors for the 3 folds respectively. I guess it is safe to conclude the returned value as the max-absolute-error of your estimator in the average or general case.
with cross_val_predict() you would get 3-sets of predictions corresponding to 3-folds and taking the maximum-absolute-error over the aggregate (concatenation) of these 3-sets of predictions is certainly not the same as above. Even if the predicted values are identical in both the scenarios, what you end up with here is max of (a, b,c ). Also, max(a,b,c) would be an unreasonable and overly pessimistic characterization of the max-absolute-error score of your model.
Upvotes: 0
Reputation: 5955
It seems to be based on how samples are grouped and predicted. From the user guide linked in the cross_val_predict
docs:
Warning Note on inappropriate usage of cross_val_predict
The result of cross_val_predict may be different from those obtained using cross_val_score as the elements are grouped in different ways. The function cross_val_score takes an average over cross-validation folds, whereas cross_val_predict simply returns the labels (or probabilities) from several distinct models undistinguished. Thus, cross_val_predict is not an appropriate measure of generalisation error.
The cross_val_score
seems to say that it averages across all of the folds, while the cross_val_predict
groups individual folds and distinct models but not all and therefore it won't necessarily generalize as well. For example, using the sample code from the sklearn page:
from sklearn import datasets, linear_model
from sklearn.model_selection import cross_val_predict, cross_val_score
from sklearn.metrics import mean_squared_error, make_scorer
diabetes = datasets.load_diabetes()
X = diabetes.data[:200]
y = diabetes.target[:200]
lasso = linear_model.Lasso()
y_pred = cross_val_predict(lasso, X, y, cv=3)
print("Cross Val Prediction score:{}".format(mean_squared_error(y,y_pred)))
print("Cross Val Score:{}".format(np.mean(cross_val_score(lasso, X, y, cv=3, scoring = make_scorer(mean_squared_error)))))
Cross Val Prediction score:3993.771257795029
Cross Val Score:3997.1789145156217
Upvotes: 1