Can OneVsRestClassifier be used to produce individual binary classifier models in Python Scikit-Learn?

Question

I am reading the documentation of Scikit-learn's OneVsRestClassifier(), link. It looks to me that, the OneVsRestClassifier first binarizes the multiple classes into binary classes, then train the model, and repeat for each of the classes. Then at the end, it "averages" the scores into a final ML model that can predict multiple classes.

For my example, I have multiclass labels label1, label2, label3, but instead of summarizing at the end, is it possible to use OneVsRestClassifier() to give me binary classifications, iteratively.

I like to get 3 trained ML models. First is for label1 vs the rest (label2 and label3), second is for label2 vs the rest (label1 and label3), and third is for label3 vs the rest (label1 and label2).

I understand I can manually binarize/dichotomize the outcome label, and run the binary ML algorithm three times. But I wonder if the OneVsRestClassifier() has a better and more efficient capability to replace this manual work.

MaximeKan · Accepted Answer

Once you have trained your OneVsRestClassifier model, all the binary classifiers are saved in the estimators_ attribute. This is how you can use them using a quick example:

from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.multiclass import OneVsRestClassifier
from sklearn.model_selection import train_test_split

iris = load_iris() #iris has 3 classes, just like your example
X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(X,y, random_state = 42)

RFC = RandomForestClassifier(100, random_state = 42)
OVRC = OneVsRestClassifier(RFC)

OVRC.fit(X_train, y_train)

Your three classifiers can be accessed via:

OVRC.estimators_[0] # label 0 vs the rest
OVRC.estimators_[1] # label 1 vs the rest
OVRC.estimators_[2] # label 2 vs the rest

Their individual predictions can be get as following:

print(OVRC.estimators_[0].predict_proba(X_test[0:5]))
print(OVRC.estimators_[1].predict_proba(X_test[0:5]))
print(OVRC.estimators_[2].predict_proba(X_test[0:5]))

>>> [[1.   0.  ]
     [0.03 0.97] # vote for label 0
     [1.   0.  ]
     [1.   0.  ]
     [1.   0.  ]]
    [[0.02 0.98] # vote for label 1
     [0.97 0.03]
     [0.97 0.03]
     [0.   1.  ] # vote for label 1
     [0.19 0.81]] # vote for label 1
    [[0.99 0.01] 
     [1.   0.  ]
     [0.   1.  ] # vote for label 2
     [0.99 0.01]
     [0.85 0.15]]

This is consistant with the overall prediction, which is:

print(OVRC.predict_proba(X_test[0:5]))

>>> [[0.         0.98989899 0.01010101]
     [0.97       0.03       0.        ]
     [0.         0.02912621 0.97087379]
     [0.         0.99009901 0.00990099]
     [0.         0.84375    0.15625   ]]

Can OneVsRestClassifier be used to produce individual binary classifier models in Python Scikit-Learn?

Answers (1)

Related Questions