grantaguinaldo
grantaguinaldo

Reputation: 119

How to return an array of false positives from a confusion matrix in scikit-learn?

I am working on building a binary classifier in scikit-learn that will classify text reviews. The basic workflow includes the following:

#Splitting the data into training and testing sets.
X_train, X_test, y_train, y_test = train_test_split(X, y, 
test_size=0.20, random_state=42)

#Instantiate a model
nb = MultinomialNB()

#Train the model.
nb.fit(X_train, y_train)

#Make predictions using the trained model
y_pred_class = nb.predict(X_test)

#View confusion matrix
confusion_matrix(y_test, y_pred_class)

#Output of confusion matrix
array([[295,  13],
      [ 80,  70]])

Based on the confusion matrix, there are 13 false positives and 80 false negatives.

I want to see the 13 text reviews that are being classified as being a false positive.

I followed this post to see if I can get a list of the 13 entires that are being classified as false positives.

However, when I run the following:

X_test[y_test != y_pred_class]

I get the following object:

<458x758 sparse matrix of type '<class 'numpy.float64'>'
with 16890 stored elements in Compressed Sparse Row format>

This appears to return all of the values in X_test (458 total entries). I expected an object that was less than 458 entries.

I also expected to see the text data of X_test as opposed to an object.

My question is this:

How can I return the 13 entries from X_test that were misclassified as false positives? I am looking for an output that looks like the example below.

2175    This has to be the worst restaurant in terms o...
1781    If you like the stuck up Scottsdale vibe this ...
2674    I'm sorry to be what seems to be the lone one ...
Name: text, dtype: object

Upvotes: 2

Views: 2741

Answers (1)

Vivek Kumar
Vivek Kumar

Reputation: 36619

For false positives, you need to also check for values which are 1 in y_pred_class, in addition to y_test != y_pred_class.

Try this:

import numpy as np
false_positives = np.logical_and(y_test != y_pred_class, y_pred_class == 1)

X_test[false_positives]

Upvotes: 4

Related Questions