Reputation: 2977
How a metric computed with cross_val_score can differ from the same metric computed starting from cross_val_predict (used to obtain predictions to be then given to a metric function)?
Here is an example:
from sklearn import cross_validation
from sklearn import datasets
from sklearn import metrics
from sklearn.naive_bayes import GaussianNB
iris = datasets.load_iris()
gnb_clf = GaussianNB()
# compute mean accuracy with cross_val_predict
predicted = cross_validation.cross_val_predict(gnb_clf, iris.data, iris.target, cv=5)
accuracy_cvp = metrics.accuracy_score(iris.target, predicted)
# compute mean accuracy with cross_val_score
score_cvs = cross_validation.cross_val_score(gnb_clf, iris.data, iris.target, cv=5)
accuracy_cvs = score_cvs.mean()
print('Accuracy cvp: %0.8f\nAccuracy cvs: %0.8f' % (accuracy_cvp, accuracy_cvs))
In this case, we obtain the same result:
Accuracy cvp: 0.95333333
Accuracy cvs: 0.95333333
Nevertheless, this seems not to be always the case, as on the official documentation it is written (regarding a result computed using cross_val_predict):
Note that the result of this computation may be slightly different from those obtained using cross_val_score as the elements are grouped in different ways.
Upvotes: 1
Views: 1107
Reputation: 1
In addition to lejlot's answer, another way that you might get slightly different results between cross_val_score and cross_val_predict is when the target classes are not distributed in a way that allows them to be evenly split between folds.
According to the documentation for cross_val_predict, if the estimator is a classifier and y is either binary or multiclass, StratifiedKFold is used by default. This may lead to a situation where even though the total number of instances in the dataset is divisible by the number of folds, you end up with folds of slightly different sizes, because the splitter is splitting based on the presence of the target. This can then lead to the issue where an average of averages is slightly different to an overall average.
For example, if you have 100 data points, and 33 of these are the target class, then KFold
with n_splits=5
would split this into 5 folds of 20 observations, but StratifiedKFold
would not necessarily give you equally-sized folds.
Upvotes: 0
Reputation: 66775
Imagine following labels and splitting
[010|101|10]
So you have 8 data points, 4 per class and you split it to 3 folds, leading to 2 folds with 3 elements and one with 2. Now let us assume that during cross validation you get following preds
[010|100|00]
thus, your scores are [100%, 67%, 50%], and cross val score (as an average) is around 72%. Now what about accuracy over predictions? You clearly have 6/8 things right, thus 75%. As you can see the scores are different, even thoug they both rely on cross validation. Here, the difference arises because the splits are not exactly the same size, thus this last "50%" is actually lowering total score because it is an avergae over just 2 samples (and the rest are based on 3).
There might be other similar phenomena, in general - it should boil down to the way averaging is computed. Thus - cross val score is an average over averages, which does not have to be an average over cross validation predictions.
Upvotes: 1