Kenia Cevallos
Kenia Cevallos

Reputation: 1

how to i handle a multiclass decision tree?

I am new to python & ML, but I am trying to use sklearn to build a decision tree. I have many categorical features and I have transformed them into numerical variables. However, my target feature is a multiclass and I am run into an error. How should I handle targets that are multiclass?

ValueError: Target is multiclass but average='binary'. Please choose another average setting, one of [None, 'micro', 'macro', 'weighted'].

from sklearn.model_selection import train_test_split

#SPLIT DATA INTO TRAIN AND TEST SET
X_train, X_test, y_train, y_test = train_test_split(X, y, 
                                                    test_size =0.30, #by default is 75%-25%
                                                    #shuffle is set True by default,
                                                    stratify=y, #preserve target propotions 
                                                    random_state= 123) #fix random seed for replicability

print(X_train.shape, X_test.shape)


from sklearn.tree import DecisionTreeClassifier
model = DecisionTreeClassifier(criterion='gini', max_depth=3, min_samples_split=4, min_samples_leaf=2)

model.fit(X_train, y_train)
y_pred = model.predict(X_test)

# criterion : "gini", "entropy"
# max_depth : The maximum depth of the tree.
# min_samples_split : The minimum number of samples required to split an internal node:
# min_samples_leaf : The minimum number of samples required to be at a leaf node. 

#DEFINE YOUR CLASSIFIER and THE PARAMETERS GRID
from sklearn.tree import DecisionTreeClassifier
import numpy as np

classifier = DecisionTreeClassifier()
parameters = {'criterion': ['entropy','gini'], 
              'max_depth': [3,4,5],
              'min_samples_split': [5,10],
              'min_samples_leaf': [2]}

from sklearn.model_selection import GridSearchCV
gs = GridSearchCV(classifier, parameters, cv=3, scoring = 'f1', verbose=50, n_jobs=-1, refit=True)

enter image description here

Upvotes: 0

Views: 1239

Answers (2)

Kenia Cevallos
Kenia Cevallos

Reputation: 1

Thank you so much for you help. I figured it out. It was actually on the gs line. In scoring, I needed to adjust what you mentioned. So i revised scoring = f1_macro

gs = GridSearchCV(classifier, parameters, cv=3, scoring=f1_macro, verbose=50, n_jobs=-1, refit=True)

Upvotes: 0

Danylo Baibak
Danylo Baibak

Reputation: 2326

You should specify the score function manually:

from sklearn.metrics import f1_score, make_scorer

f1 = make_scorer(f1_score, average='weighted')

....

gs = GridSearchCV(classifier, parameters, cv=3, scoring=f1, verbose=50, n_jobs=-1, refit=True)

Upvotes: 1

Related Questions