Bex T.
Bex T.

Reputation: 1796

How to specify the positive class manually before fitting Sklearn estimators and transformers

I am trying to predict credit card approvals using the relevant dataset from UCI ML Repo. The problem is that the target encodes the applications for credit cards as '+' for approved and '-' for rejected.

As there are a bit more rejected applications in the target, all scorers, estimators are treating the rejected class as positive while it should be otherwise. Because of this, my confusion matrix is all messed up because I think all True Positives and True Negatives, False Positives and False Negatives get inverted:

enter image description here

How can I specify the positive class manually?

Upvotes: 1

Views: 1350

Answers (1)

afsharov
afsharov

Reputation: 5164

I do not know of scikit-learn estimators or transformers that let you flip positive and negative class identifiers as a parameter. But I can think of two ways to work around this:


Method 1: You transform the array labels yourself before fitting the estimator

That can be easily achieved for numpy arrays:

y = np.array(['+', '+', '+', '-', '-'])
y_transformed = [1 if i == '+' else 0 for i in y]

and also pandas Series objects:

y = pd.Series(['+', '+', '+', '-', '-'])
y_transformed = y.map({'+': 1, '-': 0})

In both cases the output will be [1, 1, 1, 0, 0]


Method 2: You define the labels parameter in confusion_matrix

scikit-learn's confusion_matrix has a parameter labels that lets you reorder the labels. Use like this:

y_true = np.array([1, 1, 1, 0, 0])
y_pred = np.array([1, 0, 1, 0, 0])

print(confusion_matrix(y_true, y_pred))
# output
[[2 0]
 [1 2]]


print(confusion_matrix(y_true, y_pred, labels=[1, 0]))
# output
[[2 1]
 [0 2]]

Upvotes: 2

Related Questions