Reputation: 145
Given a classification problem, sometimes we do not just predict a class, but need to return the probability that it is a class.
i.e. P(y=0|x), P(y=1|x), P(y=2|x), ..., P(y=C|x)
Without building a new classifier to predict y=0, y=1, y=2... y=C respectively. Since training C classifiers (let's say C=100) can be quite slow.
What can be done to do this? What classifiers naturally can give all probabilities easily (one I know is using neural network with 100 out nodes)? But if I use traditional random forests, I can't do that, right? I use the Python Scikit-Learn library.
Upvotes: 1
Views: 2109
Reputation: 150
Random forests do indeed give P(Y/x) for multiple classes. In most cases P(Y/x) can be taken as:
P(Y/x)= the number of trees which vote for the class/Total Number of trees.
However you can play around with this, for example in one case if the highest class has 260 votes, 2nd class 230 votes and other 5 classes 10 votes, and in another case class 1 has 260 votes, and other classes have 40 votes each, you migth feel more confident in your prediction in 2nd case as compared to 1st case, so you come up with a confidence metric according to your use case.
Upvotes: 0
Reputation: 12808
If you want probabilities, look for sklearn-classifiers that have method: predict_proba()
Sklearn documentation about multiclass:[http://scikit-learn.org/stable/modules/multiclass.html]
All scikit-learn classifiers are capable of multiclass classification. So you don't need to build 100 models yourself.
Below is a summary of the classifiers supported by scikit-learn grouped by strategy:
Upvotes: 2