user3368526
user3368526

Reputation: 2328

ValueError: y contains non binary labels for VotingClassifier with my GradientBoostingClassifier

Hi I am trying to use VotingClassifier with my GradientBoostingClassifier that I put a wrapper around in order to make use of sample_weight. However, I got the below error and can't figure out how to fix it.

The code:

class MyGradientBoostingClassifier(GradientBoostingClassifier):
    def fit(self, X , y=None):
        return super(GradientBoostingClassifier, self).fit(X, y, sample_weight=y)


rf =  RandomForestClassifier(n_jobs=-1)
mygb = MyGradientBoostingClassifier()

vc = VotingClassifier(estimators=[('rf', rf), ('mygb', mygb)],
                        voting='soft',
                        weights=[1,2])

mygb.fit(X5, y5)

Sample piece of y is [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 1. 0. 0. 0. 0.] and it's np array

Error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-62-c56d4cac146f> in <module>()
     13                         weights=[1,2])
     14 
---> 15 mygb.fit(X5, y5)

<ipython-input-62-c56d4cac146f> in fit(self, X, y)
      3         print np.shape(y), np.shape(X), Counter(y), type(y)
      4         print y[:20]
----> 5         return super(GradientBoostingClassifier, self).fit(X, y, sample_weight=y)
      6 
      7 

/Users/a/anaconda/lib/python2.7/site-packages/sklearn/ensemble/gradient_boosting.pyc in fit(self, X, y, sample_weight, monitor)
    987 
    988             # fit initial model - FIXME make sample_weight optional
--> 989             self.init_.fit(X, y, sample_weight)
    990 
    991             # init predictions

/Users/a/anaconda/lib/python2.7/site-packages/sklearn/ensemble/gradient_boosting.pyc in fit(self, X, y, sample_weight)
    117 
    118         if neg == 0 or pos == 0:
--> 119             raise ValueError('y contains non binary labels.')
    120         self.prior = self.scale * np.log(pos / neg)
    121 

ValueError: y contains non binary labels.

Upvotes: 1

Views: 900

Answers (1)

ogrisel
ogrisel

Reputation: 40159

For classification models y is expected to be an integer class label (0 and 1) therefore it does not make sense to use it both as the target for classification and as a sample weight.

All the samples with 0 weight are ignore with the model and it is not possible to train a binary classification model only with sample from the same single class of the training set.

Upvotes: 1

Related Questions