aharbiy
aharbiy

Reputation: 13

Error with model_evaluation_utils confusion matrix

I am working on this tutorial https://medium.com/@sarahcy/read-this-how-winners-create-life-changing-habits-that-actually-work-atomic-habits-by-james-ac7a3c6df911, I am currently trying to run the model evaluation part

class_labels = list(set(labels))
meu.display_model_performance_metrics(true_labels=y_test, predicted_labels=predictions, classes=class_labels)

I get this error

Prediction Confusion Matrix:
------------------------------
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/blabla/XAI/model_evaluation_utils.py", line 87, in display_model_performance_metrics
    classes=classes)
  File "/blabla/XAI/model_evaluation_utils.py", line 62, in display_confusion_matrix
    labels=level_labels), 
TypeError: __new__() got an unexpected keyword argument 'labels'

How can I solve this issue?

BR

Edit: Here is the code upto the problematic line:

# part1
import pandas as pd 
import numpy as np 
import model_evaluation_utils as meu
import matplotlib.pyplot as plt
from collections import Counter
import shap
import eli5

import warnings
warnings.filterwarnings('ignore')
plt.style.use('fivethirtyeight')

shap.initjs()

#part 2
data, labels = shap.datasets.adult(display=True)
labels = np.array([int(label) for label in labels])

print(data.shape , labels.shape)
data.head()

#part 3
Counter(labels)

#part 4
cat_cols = data.select_dtypes(['category']).columns
data[cat_cols] = data[cat_cols].apply(lambda x: x.cat.codes)
data.head()

#part 5
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(data, labels, test_size=0.3, random_state=42)
print(X_train.shape, X_test.shape)
X_train.head(3)

data_disp, labels_disp = shap.datasets.adult(display=True)
X_train_disp, X_test_disp, y_train_disp, y_test_disp = train_test_split(data_disp, labels_disp, test_size=0.3, random_state=42)
print(X_train_disp.shape, X_test_disp.shape)
X_train_disp.head(3)

#part 6
import xgboost as xgb
xgc = xgb.XGBClassifier(n_estimators=500, max_depth=5, base_score=0.5,
                        objective='binary:logistic', random_state=42)
xgc.fit(X_train, y_train)

#part 7
predictions = xgc.predict(X_test)
predictions[:10]

#part 8
class_labels = list(set(labels))
meu.display_model_performance_metrics(true_labels=y_test, predicted_labels=predictions, classes=class_labels)

Upvotes: 1

Views: 914

Answers (1)

Taimour Mourad
Taimour Mourad

Reputation: 11

remove this function:

def display_confusion_matrix(true_labels, predicted_labels, classes=[1,0]):
    
    total_classes = len(classes)
    level_labels = [total_classes*[0], list(range(total_classes))]

    cm = metrics.confusion_matrix(y_true=true_labels, 
         y_pred=predicted_labels, labels=classes)
    cm_frame = pd.DataFrame(data=cm, 
                            columns=pd.MultiIndex(levels=[['Predicted:'], 
                                                 classes], 
                                                  labels=level_labels), 
                            index=pd.MultiIndex(levels=[['Actual:'], classes], 
                                                labels=level_labels)) 
    print(cm_frame) 

and it's content

then go the notebook: at the last line of your code:

meu.display_model_performance_metrics(true_labels=y_test,predicted_labels=predictions, classes=class_labels)

you will notice that the wanted function is "display_model_performance_metrics"

= so we will went back to our model_evaluation_utils

and we will go to the function:

def display_model_performance_metrics(true_labels, predicted_labels, classes=[1,0]):
    print('Model Performance metrics:')
    print('-'*30)
    get_metrics(true_labels=true_labels, predicted_labels=predicted_labels)
    print('\nModel Classification report:')
    print('-'*30)
    display_classification_report(true_labels=true_labels, predicted_labels=predicted_labels, 
                                  classes=classes)
    print('\nPrediction Confusion Matrix:')
    print('-'*30)
    display_confusion_matrix(true_labels=true_labels, predicted_labels=predicted_labels, 
                             classes=classes)

and get rid of the last line:

display_confusion_matrix(true_labels=true_labels,predicted_labels=predicted_labels,classes=classes)

to become a little simpler like:

def display_model_performance_metrics(true_labels, predicted_labels, classes=[1,0]):
    print('Model Performance metrics:')
    print('-'*30)
    get_metrics(true_labels=true_labels, predicted_labels=predicted_labels)
    print('\nModel Classification report:')
    print('-'*30)
    display_classification_report(true_labels=true_labels, predicted_labels=predicted_labels, 
                                  classes=classes)
    print('\nPrediction Confusion Matrix:')
    print('-'*30)

and finally if you want to use:

display_confusion_matrix(true_labels=true_labels, predicted_labels=predicted_labels, classes=classes)

you can use it directly in your notebook instead of file.py or python file that's it:)

Upvotes: 1

Related Questions