Reputation: 1547
Does any one knows if sklearn supports different parameters for the various classifiers inside a OneVsRestClassifier
? For instance in that exemple, I would like to have different values of C
for the different classes.
from sklearn.multiclass import OneVsRestClassifier
from sklearn.svm import LinearSVC
text_clf = OneVsRestClassifier(LinearSVC(C=1.0, class_weight="balanced"))
Upvotes: 1
Views: 711
Reputation: 36599
No OneVsRestClassifier doesnt currently different parameter of estimators or different estimators for different classes currently.
There are some implemented in other things like LogisticRegressionCV which will automatically tune different values of parameters according to classes, but its not extended yet for OneVsRestClassifier yet.
But if you want that, we can do the change in the source to implement that.
Current source of fit()
in the master branch is this:
...
...
self.estimators_ = Parallel(n_jobs=self.n_jobs)(delayed(_fit_binary)(
self.estimator, X, column, classes=[
"not %s" % self.label_binarizer_.classes_[i],
self.label_binarizer_.classes_[i]])
for i, column in enumerate(columns))
As you can see, same estimator (self.estimator
) is being passed to all classes to be trained. So we will make a new version of OneVsRestClassifier to change this:
from sklearn.multiclass import OneVsRestClassifier
from sklearn.preprocessing import LabelBinarizer
from sklearn.externals.joblib import Parallel, delayed
from sklearn.multiclass import _fit_binary
class CustomOneVsRestClassifier(OneVsRestClassifier):
# Changed the estimator to estimators which can take a list now
def __init__(self, estimators, n_jobs=1):
self.estimators = estimators
self.n_jobs = n_jobs
def fit(self, X, y):
self.label_binarizer_ = LabelBinarizer(sparse_output=True)
Y = self.label_binarizer_.fit_transform(y)
Y = Y.tocsc()
self.classes_ = self.label_binarizer_.classes_
columns = (col.toarray().ravel() for col in Y.T)
# This is where we change the training method
self.estimators_ = Parallel(n_jobs=self.n_jobs)(delayed(_fit_binary)(
estimator, X, column, classes=[
"not %s" % self.label_binarizer_.classes_[i],
self.label_binarizer_.classes_[i]])
for i, (column, estimator) in enumerate(zip(columns, self.estimators)))
return self
And now you can use it.
# Make sure you add those many estimators as there are classes
# In binary case, only a single estimator should be used
estimators = []
# I am considering 3 classes as of now
estimators.append(LinearSVC(C=1.0, class_weight="balanced"))
estimators.append(LinearSVC(C=0.1, class_weight="balanced"))
estimators.append(LinearSVC(C=10, class_weight="balanced"))
clf = CustomOneVsRestClassifier(estimators)
clf.fit(X, y)
Note: I haven't yet implemented partial_fit()
in it yet. If you intend to use that we can work on it.
Upvotes: 3