Clement Attlee
Clement Attlee

Reputation: 733

How does voting between two classifiers work in sklearn?

For a classification task, I am using voting classifier to ensemble logistic regression and SVM with voting parameter set to soft. The result is clearly better than each individual model. I am not sure if I understand how it works though. How can the model find the majority vote between only two models?

Upvotes: 4

Views: 4613

Answers (1)

Pratik Kumar
Pratik Kumar

Reputation: 2231

Assuming you have two classes class-A and class-B

Logistic Regression( has an inbuilt predict_proba() method) and SVC(set probability=True) both are able to estimate class probabilities on their outputs i.e. they predict if input is class-A with probability a and class-B with probability b. If a>b then it outputs predicted class is A otherwise B .In a voting classifier setting the voting parameter to soft enables them(SVM and LogiReg) to calculate their probability(also known as confidence score) individually and present it to the voting classifier, then the voting classifier averages them and outputs the class with the highest probability.

Make sure that if you set voting=soft then the classifiers you provide can also calculate this confidence score.

To see the confidence of each classifier you can do:

from sklearn.metrics import accuracy_score
y_pred=classifer_name.predict(X_test) #classifier_name=trained SVM/LogiReg/VotingClassifier
print(classifier_name.__class__.__name__,accuracy_score(y_true,y_pred))

NOTE: a+b may not appear to be 1 due to computer floating point round off. But it is 1. I can't say about other confidence scores like decision functions, but with predict_proba() it is the case.

Upvotes: 9

Related Questions