Tengerye
Tengerye

Reputation: 1964

Sklearn: NotFittedError: This SVC instance is not fitted yet. Soft Voting on Calibration classifiers

I tried to use soft voting on calibration classifiers on sklearn. Since soft voting does not have prefit option so far, I tried to make VotingClassifier.fit() to call CalibratedClassifierCV.fit(). The following is my code:

data = load_breast_cancer()

# Data spliting.
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.2)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.25)

# Base classifiers.
clf_svm = svm.SVC(gamma=0.001, probability=True)
clf_svm.fit(X_train, y_train)

clf_lr = LogisticRegression(random_state=0, solver='lbfgs')
clf_lr.fit(X_train, y_train)

svm_isotonic = CalibratedClassifierCV(clf_svm, cv='prefit', method='isotonic')
svm_isotonic.fit(X_val, y_val)

lr_isotonic = CalibratedClassifierCV(clf_lr, cv='prefit', method='isotonic')
lr_isotonic.fit(X_val, y_val)

eclf_soft2 = VotingClassifier(estimators=[
    ('svm', svm_isotonic), ('lr', lr_isotonic)], voting ='soft')
eclf_soft2.fit(X_val, y_val)

However, I got some strange errors:

Traceback (most recent call last):
  File "/home/ubuntu/projects/faceRecognition/faceVerif/util/plot_calibration.py", line 127, in <module>
    main(parse_arguments(sys.argv[1:]))
  File "/home/ubuntu/projects/faceRecognition/faceVerif/util/plot_calibration.py", line 120, in main
    eclf_soft2.fit(X_val, y_val)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/sklearn/ensemble/voting_classifier.py", line 189, in fit
    for clf in clfs if clf is not None)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 779, in __call__
    while self.dispatch_one_batch(iterator):
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 625, in dispatch_one_batch
    self._dispatch(tasks)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 588, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/sklearn/externals/joblib/_parallel_backends.py", line 111, in apply_async
    result = ImmediateResult(func)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/sklearn/externals/joblib/_parallel_backends.py", line 332, in __init__
    self.results = batch()
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 131, in __call__
    return [func(*args, **kwargs) for func, args, kwargs in self.items]
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 131, in <listcomp>
    return [func(*args, **kwargs) for func, args, kwargs in self.items]
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/sklearn/ensemble/voting_classifier.py", line 31, in _parallel_fit_estimator
    estimator.fit(X, y)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/sklearn/calibration.py", line 157, in fit
    calibrated_classifier.fit(X, y)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/sklearn/calibration.py", line 335, in fit
    df, idx_pos_class = self._preproc(X)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/sklearn/calibration.py", line 290, in _preproc
    df = self.base_estimator.decision_function(X)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/sklearn/svm/base.py", line 527, in decision_function
    dec = self._decision_function(X)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/sklearn/svm/base.py", line 384, in _decision_function
    X = self._validate_for_predict(X)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/sklearn/svm/base.py", line 437, in _validate_for_predict
    check_is_fitted(self, 'support_')
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/sklearn/utils/validation.py", line 768, in check_is_fitted
    raise NotFittedError(msg % {'name': type(estimator).__name__})
sklearn.exceptions.NotFittedError: This SVC instance is not fitted yet. Call 'fit' with appropriate arguments before using this method.

My question is how to fix this error or is there any alternative solution please?

Thank you in advance.

Upvotes: 1

Views: 3209

Answers (1)

Vivek Kumar
Vivek Kumar

Reputation: 36599

VotingClassifier will clone the supplied estimators (and their internal estimators as in this case) and then try to fit on them. But in CalibratedClassifierCV you use cv='prefit' which assumes that you have already fitted the estimators. This leads to conflict and this error.

Explanation:

VotingClassifier have two internal estimators

  • ('svm', svm_isotonic),
  • ('lr', lr_isotonic)

When you call eclf_soft2.fit, it will first clone the svm_isotonic and lr_isotonic. Cloning these CalibratedClassifierCV estimators will then clone its base estimators clf_svm and clf_lr.

This cloning happens such that only the parameter values are copied and not the actual attributes learnt from previous calls to fit(). So essentially your cloned clf_svm and clf_lr are unfitted now.

Unfortunately there is no simple way to set this right for your usecase: To fit the votingclassifier, which will in turn fit the internal calibratedClassifiers but not fit the base classifiers.

But if you only want to use the VotingClassifier for its soft-voting capabilities on the combined system of two CalibratedClassifierCV estimators, this can be done easily.

Taking ideas from my other answer on a similar question:

You can do this:

import numpy as np

# Define functions
def custom_fit(estimators, X, y):
    for clf in estimators:
        clf.fit(X, y)

def custom_predict(estimators, X, voting = 'soft', weights = None):

    if voting == 'hard':
        pred = np.asarray([clf.predict(X) for clf in estimators]).T
        pred = np.apply_along_axis(lambda x:
                                   np.argmax(np.bincount(x, weights=weights)),
                                   axis=1,
                                   arr=pred.astype('int'))
    else:
        pred = np.asarray([clf.predict_proba(X) for clf in estimators])
        pred = np.average(pred, axis=0, weights=weights)
        pred = np.argmax(pred, axis=1)

    return pred


# Use them
estimators=[svm_isotonic, lr_isotonic]
custom_fit(estimators, X_val, y_val)

custom_predict(estimators, X_test)    

Upvotes: 1

Related Questions