Reputation: 19855
I am trying to use cross validation to test my classifier using Sklearn.
I have 3 classes, and total of 50 samples.
The following runs as expected, which is presumably making 5-folds cross validation.
result = cross_validation.cross_val_score(classifier, X, y, cv=5)
I am trying to do leave-one-out with using cv=50 folds, so I do the following,
result = cross_validation.cross_val_score(classifier, X, y, cv=50)
However, surprisingly, it gives the following error:
/Library/Python/2.7/site-packages/sklearn/cross_validation.py:413: Warning: The least populated class in y has only 5 members, which is too few. The minimum number of labels for any class cannot be less than n_folds=50.
% (min_labels, self.n_folds)), Warning)
/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/numpy/core/_methods.py:55: RuntimeWarning: Mean of empty slice.
warnings.warn("Mean of empty slice.", RuntimeWarning)
/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/numpy/core/_methods.py:67: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
Traceback (most recent call last):
File "b.py", line 96, in <module>
scores1 = cross_validation.cross_val_score(classifier, X, y, cv=50)
File "/Library/Python/2.7/site-packages/sklearn/cross_validation.py", line 1151, in cross_val_score
for train, test in cv)
File "/Library/Python/2.7/site-packages/sklearn/externals/joblib/parallel.py", line 653, in __call__
self.dispatch(function, args, kwargs)
File "/Library/Python/2.7/site-packages/sklearn/externals/joblib/parallel.py", line 400, in dispatch
job = ImmediateApply(func, args, kwargs)
File "/Library/Python/2.7/site-packages/sklearn/externals/joblib/parallel.py", line 138, in __init__
self.results = func(*args, **kwargs)
File "/Library/Python/2.7/site-packages/sklearn/cross_validation.py", line 1240, in _fit_and_score
test_score = _score(estimator, X_test, y_test, scorer)
File "/Library/Python/2.7/site-packages/sklearn/cross_validation.py", line 1296, in _score
score = scorer(estimator, X_test, y_test)
File "/Library/Python/2.7/site-packages/sklearn/metrics/scorer.py", line 176, in _passthrough_scorer
return estimator.score(*args, **kwargs)
File "/Library/Python/2.7/site-packages/sklearn/base.py", line 291, in score
return accuracy_score(y, self.predict(X), sample_weight=sample_weight)
File "/Library/Python/2.7/site-packages/sklearn/neighbors/classification.py", line 147, in predict
neigh_dist, neigh_ind = self.kneighbors(X)
File "/Library/Python/2.7/site-packages/sklearn/neighbors/base.py", line 332, in kneighbors
return_distance=return_distance)
File "binary_tree.pxi", line 1307, in sklearn.neighbors.kd_tree.BinaryTree.query (sklearn/neighbors/kd_tree.c:10506)
File "binary_tree.pxi", line 226, in sklearn.neighbors.kd_tree.get_memview_DTYPE_2D (sklearn/neighbors/kd_tree.c:2715)
File "stringsource", line 247, in View.MemoryView.array_cwrapper (sklearn/neighbors/kd_tree.c:24789)
File "stringsource", line 147, in View.MemoryView.array.__cinit__ (sklearn/neighbors/kd_tree.c:23664)
ValueError: Invalid shape in axis 0: 0.
Also, another weird thing is, when I do cv=5, I don't get any warnings. When I do cv=50 I get the above warning which is weird. Because I think when cv gets bigger, even though it may be computationally harder, the result should be more accurate. Is there any gap with my reasoning? Why do I get the Warning and error?
How can I do leave-one-out cross validation in this scenario properly?
Upvotes: 2
Views: 2792
Reputation: 28788
By default, cv=5 for classification does stratified 5-fold cross-validation. That means it tries to keep the fraction of samples from one class constant. It might be that this results in trouble when the number of folds is the same as the number of samples. Which version are you on? This error message is certainly not very helpful.
Btw, in general I'd suggest you use StratifiedShuffleSplit
for such a small dataset.
[edit]: the current version gives a warning, which should probably be an error:
sklearn/cross_validation.py:399: Warning: The least populated class in y has only 13 members, which is too few. The minimum number of labels for any class cannot be less than n_folds=68. % (min_labels, self.n_folds)), Warning)
Upvotes: 5