Reputation: 21
I am testing a simple model (knn) and trying to compare results with an Ensamble.
from sklearn.model_selection import cross_val_score
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import BaggingClassifier
from sklearn.datasets import load_iris
data = load_iris()
y = data.target
X = data.data
knn = KNeighborsClassifier()
bagging = BaggingClassifier(knn, max_samples=0.5, max_features=0.5)
print "KNN Score:\t", cross_val_score(knn, X, y, cv=5, n_jobs=-1).mean()
print "Bagging Score:\t", cross_val_score(bagging, X, y, cv=5, n_jobs=-1).mean()
But everytime I run it the code I get the same error estimation... Should not be different every time?
Upvotes: 1
Views: 1174
Reputation: 19634
There are two scores that are calculated in your code. The first one,
print "KNN Score:\t", cross_val_score(knn, X, y, cv=5, n_jobs=-1).mean()
will always return the same value. The reason is that there is nothing random in this process. The data is exactly the same, and the devision into 5 folds is exactly the same (as indicated here, the data is split into 5 consecutive folds).
However, when calculating the following score:
print "Bagging Score:\t", cross_val_score(bagging, X, y, cv=5, n_jobs=-1).mean()
there is a randomness in the process. For example, since max_samples=0.5
, you draw at random half of the samples to train each base estimator. Hence each time you run the code you may get a different result.
Upvotes: 2