0101010
0101010

Reputation: 25

Incorrect labels in confusion matrix

I have tried to create a confusion matrix on a knn-classifier in python, but the labeled classes are wrong.

The classes attribute of the dataset is 2 (for benign) and 4 (for malignant), but when I plot the confusion matrix, all labels are 2. The code I use is:

Data source: http://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29

KNN classifier on Breast Cancer Wisconsin (Diagnostic) Data Set from UCI:

data = pd.read_csv('/breast-cancer-wisconsin.data')
data.replace('?', 0, inplace=True)
data.drop('id', 1, inplace = True)


X = np.array(data.drop(' class ', 1))
Y = np.array(data[' class '])

X_train, X_test, Y_train, Y_test = train_test_split(X,Y,test_size=0.2)
clf = neighbors.KNeighborsClassifier()
clf.fit(X_train, Y_train)

accuracy = clf.score(X_test, Y_test)

Plot confusion matrix

from sklearn.metrics import plot_confusion_matrix

disp = plot_confusion_matrix(clf, X_test, Y_test,
                               display_labels=Y,
                               cmap=plt.cm.Blues,)

Confusion matrix

Upvotes: 2

Views: 1501

Answers (1)

yatu
yatu

Reputation: 88226

The problem is that you're specifying the display_labels argument with Y, where it should just be the target names used for plotting. Now it's just using the two first values that appear in Y, which happen to be 2, 2. Note too that, as mentioned in the docs, the displayed labels will be the same as specified in labels if it is provided, so you just need:

from sklearn.metrics import plot_confusion_matrix
fig, ax = plt.subplots(figsize=(8,8))
disp = plot_confusion_matrix(clf, X_test, Y_test,
                               labels=np.unique(y),
                               cmap=plt.cm.Blues,ax=ax)

enter image description here

Upvotes: 1

Related Questions