Reputation: 3086
I am struggling to compute the class label in a simple binary classification problem, given by 2d-numpy array with probabilities for each class.
For example:
prob_01 =
array([[ 0.49253953, 0.50746047],
[ 0.01041495, 0.98958505],
[ 0.76774408, 0.23225592],
...,
[ 0.79755047, 0.20244953],
[ 0.27228677, 0.72771323],
[ 0.26953926, 0.73046074]])
where the rows are instances and the columns contain probabilities of being in the class 0 and 1 respectively for each instance. For example, for a threshold = 0.5
, one should get:
labels_01=
array([[ 1],
[ 1],
[ 0],
...,
[ 1],
[ 0],
[ 0]])
What is the simplest and pythonic way to produce the labels_01
array?
Upvotes: 1
Views: 3847
Reputation: 148
For the class 0 (first column):
threshold = 0.5
labels_01 = prob_01[:,0] < threshold
To actually get integers instead of booleans (presuming import numpy as np
):
labels_01 = (prob_01[:,0] < threshold).astype(np.int)
Or just use
prob_01 < threshold
to get both columns at once and index a column later.
Upvotes: 4