Reputation: 21
For a classification task, I am using sklearn VotingClassifier to ensemble Random Forest and Extra-Tree Classifier with parameter set to voting='hard'
. I don't understand how it works correctly since both Tree-Based models already give final prediction using voting technique. How can they work in combination using hard voting? Also if there is a tie between two models?
Can anyone explain this with example?
Upvotes: 0
Views: 1123
Reputation: 4521
You can look that up from the source code of the voting classifier. For short, it doesn't make much sense to use two classifiers with hard-voting. Rather use soft-voting.
The reason is, that in hard voting modus, the sklearn VotingClassifier
votes for the mayority vote and with two it only gets interesting if there is a tie. In case there are as many zeros as there are ones in a binary classification, the hard voting classifier will vote for 0.
You can simply test that by looking at the code, that it executed:
First set up the data for the experiment:
import numpy as np
# create a random int array with values 0 and 1
# with 20 rows (20 predictions) of 10 voters (10 columns)
a = np.random.randint(0, 2, size=(20,10))
# then produce some tie-lines with different combinations
a[0,:] = [0]*5 + [1]*5 # the first 5 predict 0 the next 5 predict 1
a[1,:] = [1]*5 + [0]*5 # vice versa
a[2,:] = [0,1]*5 # the first predicts 0 then 1 follows 0 and 0 follows 1
a[3,:] = [1,0]*5 # the same, just starting with 1
# if you want to check, how many ones you have, do:
a.sum(axis=1)
Now see, what the voter code does with this. The voter code for hard voting is (the code below simulates the case, where you have equally weighted classiifiers weights=[1]*10
):
np.apply_along_axis(
lambda x: np.argmax(
np.bincount(x, weights=[1.0]*10)),
axis=1, arr=a)
The result is:
array([0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1])
You see, that for the first four entries (the ties we manually introduced), the result is 0. So in case of a tie, the voting classifier will always choose 0 (if you choose the same weight for each of the classifiers). Note, that the weights are not only used to resolve ties, so you could have a classifier which has the double weight of another classifier, so you might even get a tie with 3 classifiers this way. But whenever the sum of predictionweight for all 0-predicting classifiers is equal to the sumf of predicitonweight for all 1-predicting classifiers, the voting classifier will predict 0 rather than 1.
Here is the relevant code: Sklearn Voting code and Description of numpy.argmax
Upvotes: 2