Reputation: 1964
I tried to use soft voting on calibration classifiers on sklearn. Since soft voting does not have prefit
option so far, I tried to make VotingClassifier.fit()
to call CalibratedClassifierCV.fit()
. The following is my code:
data = load_breast_cancer()
# Data spliting.
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.2)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.25)
# Base classifiers.
clf_svm = svm.SVC(gamma=0.001, probability=True)
clf_svm.fit(X_train, y_train)
clf_lr = LogisticRegression(random_state=0, solver='lbfgs')
clf_lr.fit(X_train, y_train)
svm_isotonic = CalibratedClassifierCV(clf_svm, cv='prefit', method='isotonic')
svm_isotonic.fit(X_val, y_val)
lr_isotonic = CalibratedClassifierCV(clf_lr, cv='prefit', method='isotonic')
lr_isotonic.fit(X_val, y_val)
eclf_soft2 = VotingClassifier(estimators=[
('svm', svm_isotonic), ('lr', lr_isotonic)], voting ='soft')
eclf_soft2.fit(X_val, y_val)
However, I got some strange errors:
Traceback (most recent call last):
File "/home/ubuntu/projects/faceRecognition/faceVerif/util/plot_calibration.py", line 127, in <module>
main(parse_arguments(sys.argv[1:]))
File "/home/ubuntu/projects/faceRecognition/faceVerif/util/plot_calibration.py", line 120, in main
eclf_soft2.fit(X_val, y_val)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/sklearn/ensemble/voting_classifier.py", line 189, in fit
for clf in clfs if clf is not None)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 779, in __call__
while self.dispatch_one_batch(iterator):
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 625, in dispatch_one_batch
self._dispatch(tasks)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 588, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/sklearn/externals/joblib/_parallel_backends.py", line 111, in apply_async
result = ImmediateResult(func)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/sklearn/externals/joblib/_parallel_backends.py", line 332, in __init__
self.results = batch()
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 131, in __call__
return [func(*args, **kwargs) for func, args, kwargs in self.items]
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 131, in <listcomp>
return [func(*args, **kwargs) for func, args, kwargs in self.items]
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/sklearn/ensemble/voting_classifier.py", line 31, in _parallel_fit_estimator
estimator.fit(X, y)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/sklearn/calibration.py", line 157, in fit
calibrated_classifier.fit(X, y)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/sklearn/calibration.py", line 335, in fit
df, idx_pos_class = self._preproc(X)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/sklearn/calibration.py", line 290, in _preproc
df = self.base_estimator.decision_function(X)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/sklearn/svm/base.py", line 527, in decision_function
dec = self._decision_function(X)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/sklearn/svm/base.py", line 384, in _decision_function
X = self._validate_for_predict(X)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/sklearn/svm/base.py", line 437, in _validate_for_predict
check_is_fitted(self, 'support_')
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/sklearn/utils/validation.py", line 768, in check_is_fitted
raise NotFittedError(msg % {'name': type(estimator).__name__})
sklearn.exceptions.NotFittedError: This SVC instance is not fitted yet. Call 'fit' with appropriate arguments before using this method.
My question is how to fix this error or is there any alternative solution please?
Thank you in advance.
Upvotes: 1
Views: 3209
Reputation: 36599
VotingClassifier
will clone the supplied estimators (and their internal estimators as in this case) and then try to fit on them. But in CalibratedClassifierCV
you use cv='prefit'
which assumes that you have already fitted the estimators. This leads to conflict and this error.
Explanation:
VotingClassifier
have two internal estimators
('svm', svm_isotonic)
, ('lr', lr_isotonic)
When you call eclf_soft2.fit
, it will first clone
the svm_isotonic
and lr_isotonic
. Cloning these CalibratedClassifierCV
estimators will then clone its base estimators clf_svm
and clf_lr
.
This cloning happens such that only the parameter values are copied and not the actual attributes learnt from previous calls to fit()
. So essentially your cloned clf_svm
and clf_lr
are unfitted now.
Unfortunately there is no simple way to set this right for your usecase: To fit the votingclassifier, which will in turn fit the internal calibratedClassifiers but not fit the base classifiers.
But if you only want to use the VotingClassifier for its soft-voting capabilities on the combined system of two CalibratedClassifierCV estimators, this can be done easily.
Taking ideas from my other answer on a similar question:
You can do this:
import numpy as np
# Define functions
def custom_fit(estimators, X, y):
for clf in estimators:
clf.fit(X, y)
def custom_predict(estimators, X, voting = 'soft', weights = None):
if voting == 'hard':
pred = np.asarray([clf.predict(X) for clf in estimators]).T
pred = np.apply_along_axis(lambda x:
np.argmax(np.bincount(x, weights=weights)),
axis=1,
arr=pred.astype('int'))
else:
pred = np.asarray([clf.predict_proba(X) for clf in estimators])
pred = np.average(pred, axis=0, weights=weights)
pred = np.argmax(pred, axis=1)
return pred
# Use them
estimators=[svm_isotonic, lr_isotonic]
custom_fit(estimators, X_val, y_val)
custom_predict(estimators, X_test)
Upvotes: 1