Thresholding of probabilities in a binary classification

Question

I am struggling to compute the class label in a simple binary classification problem, given by 2d-numpy array with probabilities for each class.

For example:

prob_01 = 
array([[ 0.49253953,  0.50746047],
       [ 0.01041495,  0.98958505],
       [ 0.76774408,  0.23225592],
       ..., 
       [ 0.79755047,  0.20244953],
       [ 0.27228677,  0.72771323],
       [ 0.26953926,  0.73046074]])

where the rows are instances and the columns contain probabilities of being in the class 0 and 1 respectively for each instance. For example, for a threshold = 0.5, one should get:

labels_01= 
array([[ 1],
       [ 1],
       [ 0],
       ..., 
       [ 1],
       [ 0],
       [ 0]])

What is the simplest and pythonic way to produce the labels_01 array?

tku137 · Accepted Answer

For the class 0 (first column):

threshold = 0.5
labels_01 = prob_01[:,0] < threshold

To actually get integers instead of booleans (presuming import numpy as np):

labels_01 = (prob_01[:,0] < threshold).astype(np.int)

Or just use

prob_01 < threshold

to get both columns at once and index a column later.

Thresholding of probabilities in a binary classification

Answers (1)

Related Questions