Reputation: 24930
It takes a while to get to the actual question, so please bear with me. The AdaBoost documentation states that it " is a meta-estimator that begins by fitting a classifier on the original dataset and then fits additional copies of the classifier on the same dataset but where the weights of incorrectly classified instances are adjusted". To do that, one of the required paramenters is base_estimator
. For the base_estimator
to be useable with AdaBoostClassifer
, "support for sample weighting is required".
So my first issue was - which classifiers provide support for sample weighting? I did some research, and, fortunately, someone smarter than me had the answer. Somewhat updated, it works thus: by running
from sklearn.utils.testing import all_estimators
print(all_estimators(type_filter='classifier'))
you get a list of all classifiers (turns out there are 31 of them!). Then, if you run
import inspect
for name, clf in all_estimators(type_filter='classifier'):
if 'sample_weight' in inspect.getfullargspec(clf().fit)[0]:
print(name)
you can get the list of all classifiers which provide support for sample weighting (21 of them, for the curious).
So far so good. But now we have to deal with another AdaBoostClassifer
parameter, namely algorithm
. You have two options: {‘SAMME’, ‘SAMME.R’}, optional (default=’SAMME.R’)
. We're told that to "use the SAMME.R real boosting algorithm base_estimator
must support calculation of class probabilities". And this is where I got stuck. Searching online, I can only find two classifiers used with ‘SAMME.R’ as an argument for algorithm
: DecisionTreeClassifier
(which is the default) and RandomForestClassifier
.
So here's the question - which other classifiers from the 21 which are compatible with AdaBoostClassifer
offer support for the calculation of class probablities?
Thanks.
Upvotes: 2
Views: 3681
Reputation: 16079
I am pretty sure that when the documentation refers to "must support calculation of class probabilities" they mean that there is a predict_proba
method.
This is the method that many classifiers use to return the probabilities for each class given an observation. With that understanding you just need to check for classifiers that have the predict_proba
method:
for name, clf in all_estimators(type_filter='classifier'):
if hasattr(clf, 'predict_proba'):
print(clf, name)
<class 'sklearn.ensemble.weight_boosting.AdaBoostClassifier'> AdaBoostClassifier
<class 'sklearn.ensemble.bagging.BaggingClassifier'> BaggingClassifier
<class 'sklearn.naive_bayes.BernoulliNB'> BernoulliNB
<class 'sklearn.calibration.CalibratedClassifierCV'> CalibratedClassifierCV
<class 'sklearn.naive_bayes.ComplementNB'> ComplementNB
<class 'sklearn.tree.tree.DecisionTreeClassifier'> DecisionTreeClassifier
<class 'sklearn.tree.tree.ExtraTreeClassifier'> ExtraTreeClassifier
<class 'sklearn.ensemble.forest.ExtraTreesClassifier'> ExtraTreesClassifier
<class 'sklearn.naive_bayes.GaussianNB'> GaussianNB
<class 'sklearn.gaussian_process.gpc.GaussianProcessClassifier'> GaussianProcess
Classifier
<class 'sklearn.ensemble.gradient_boosting.GradientBoostingClassifier'> GradientBoosti
ngClassifier
<class 'sklearn.neighbors.classification.KNeighborsClassifier'> KNeighborsClassifier
<class 'sklearn.semi_supervised.label_propagation.LabelPropagation'> LabelPropagation
<class 'sklearn.semi_supervised.label_propagation.LabelSpreading'> LabelSpreading
<class 'sklearn.discriminant_analysis.LinearDiscriminantAnalysis'> LinearDiscriminantA
nalysis
<class 'sklearn.linear_model.logistic.LogisticRegression'> LogisticRegression
<class 'sklearn.linear_model.logistic.LogisticRegressionCV'> LogisticRegressionCV
<class 'sklearn.neural_network.multilayer_perceptron.MLPClassifier'> MLPClassifier
<class 'sklearn.naive_bayes.MultinomialNB'> MultinomialNB
<class 'sklearn.svm.classes.NuSVC'> NuSVC
<class 'sklearn.discriminant_analysis.QuadraticDiscriminantAnalysis'> QuadraticDiscrim
inantAnalysis
<class 'sklearn.ensemble.forest.RandomForestClassifier'> RandomForestClassifier
<class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> SGDClassifier
<class 'sklearn.svm.classes.SVC'> SVC
So you end up with 24 of the 31 classifiers as being potential options for base_estimator
in AdaBoostClassifier
.
The error returned from using an improper classifier as base_estimator
is also quite helpful in this regard.
TypeError: AdaBoostClassifier with algorithm='SAMME.R' requires that the weak learner supports the calculation of class probabilities with a predict_proba method. Please change the base estimator or set algorithm='SAMME' instead.
As you can see the error specifically points you towards classes with the predict_proba
method.
Upvotes: 5