Reputation: 1796
I am trying to predict credit card approvals using the relevant dataset from UCI ML Repo. The problem is that the target encodes the applications for credit cards as '+' for approved and '-' for rejected.
As there are a bit more rejected applications in the target, all scorers, estimators are treating the rejected class as positive while it should be otherwise. Because of this, my confusion matrix is all messed up because I think all True Positives and True Negatives, False Positives and False Negatives get inverted:
How can I specify the positive class manually?
Upvotes: 1
Views: 1350
Reputation: 5164
I do not know of scikit-learn estimators or transformers that let you flip positive and negative class identifiers as a parameter. But I can think of two ways to work around this:
Method 1: You transform the array labels yourself before fitting the estimator
That can be easily achieved for numpy
arrays:
y = np.array(['+', '+', '+', '-', '-'])
y_transformed = [1 if i == '+' else 0 for i in y]
and also pandas Series
objects:
y = pd.Series(['+', '+', '+', '-', '-'])
y_transformed = y.map({'+': 1, '-': 0})
In both cases the output will be [1, 1, 1, 0, 0]
Method 2: You define the labels
parameter in confusion_matrix
scikit-learn's confusion_matrix
has a parameter labels
that lets you reorder the labels. Use like this:
y_true = np.array([1, 1, 1, 0, 0])
y_pred = np.array([1, 0, 1, 0, 0])
print(confusion_matrix(y_true, y_pred))
# output
[[2 0]
[1 2]]
print(confusion_matrix(y_true, y_pred, labels=[1, 0]))
# output
[[2 1]
[0 2]]
Upvotes: 2