SHIN
SHIN

Reputation: 11

Python Traceback - gradientboosting.py how can i fix this type of error

I am trying to detect SQL injection using GradientBoostingClassifier

X = dataframe.as_matrix(['token_length','entropy','sqli_g_means','plain_g_means'])

# encode categorical feature
from sklearn.preprocessing import LabelEncoder
labelencoder_y = LabelEncoder()
y = labelencoder_y.fit_transform(dataframe['type'].tolist())

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state = 0)

from sklearn.ensemble import GradientBoostingClassifier
clf = GradientBoostingClassifier(n_estimators=100, learning_rate=1.0, max_depth=7, random_state=0).fit(X_train, y_train)
print "Gradient Boosting Tree Acurracy: %f" % clf.score(X_test, y_test)

An error occurs while training a model.

Traceback (most recent call last):
File "ml_sql_injection.py", line 136, in <module>
clf = GradientBoostingClassifier(n_estimators=100, learning_rate=1.0, max_depth=7, random_state=0).fit(X_train, y_train)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/ensemble/gradient_boosting.py", line 1404, in fit
y = self._validate_y(y, sample_weight)
File "/usr/local/lib/python2.7/dist-packages/sklearn/ensemble/gradient_boosting.py", line 1968, in _validate_y
    % n_trim_classes)
 ValueError: y contains 1 class after sample_weight trimmed classes with zero weights, while a minimum of 2 classes are required.

How can I fix this type of error?

Upvotes: 1

Views: 3744

Answers (2)

James L.
James L.

Reputation: 14515

I had this problem too.

It's a good descriptive error: all data in Y has the same label. Now, in reality Y had lots of labels, but if you're feeding in a sub-sample of y (or if it uses a sample internally to the model builder) it is possible that this selection could all have the same label.

I fixed it by shuffling, which reliably solved this boosting error.

For my particular data in Pandas, it looks like:

x = pd.read_csv(filename, delimiter=",", header=None)
x = x.sample(frac=1) # shuffle -> fixes boosting errors
y = x.iloc[:,[0]] #extract label from last column
x = x.drop([0],axis=1) #drop last column from X

This also gave me an enormous and reliable estimator improvement:

enter image description here

enter image description here

Upvotes: 1

pdubey
pdubey

Reputation: 41

Although I guess it's too late, but I would still like to answer this question for others.

This error refers to the fact that y_train contains only 1 value i.e. there is only 1 class available for classification but you need atleast 2. When you are splitting your dataset into train and test, y_train is left with only class.

Upvotes: 4

Related Questions