BaggingClassifier give the same result in different executions

Question

I am testing a simple model (knn) and trying to compare results with an Ensamble.

from sklearn.model_selection import cross_val_score
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import BaggingClassifier
from sklearn.datasets import load_iris
data = load_iris()
y = data.target
X = data.data
knn = KNeighborsClassifier()
bagging = BaggingClassifier(knn, max_samples=0.5, max_features=0.5)

print "KNN Score:	", cross_val_score(knn, X, y, cv=5, n_jobs=-1).mean()
print "Bagging Score:	", cross_val_score(bagging, X, y, cv=5, n_jobs=-1).mean()

But everytime I run it the code I get the same error estimation... Should not be different every time?

Miriam Farber · Accepted Answer

There are two scores that are calculated in your code. The first one,

print "KNN Score:	", cross_val_score(knn, X, y, cv=5, n_jobs=-1).mean()

will always return the same value. The reason is that there is nothing random in this process. The data is exactly the same, and the devision into 5 folds is exactly the same (as indicated here, the data is split into 5 consecutive folds).

However, when calculating the following score:

print "Bagging Score:	", cross_val_score(bagging, X, y, cv=5, n_jobs=-1).mean()

there is a randomness in the process. For example, since max_samples=0.5, you draw at random half of the samples to train each base estimator. Hence each time you run the code you may get a different result.

BaggingClassifier give the same result in different executions

Answers (1)

Related Questions