voo
voo

Reputation: 1333

How to compute optimal threshold for accuracy

My classifier produces soft classifications and I wish to select an optimal threshold (that is, one that maximizes accuracy) from the results of the method on the training cases, and use this threshold to produce the hard classification. While in general the problem is relatively easy, I find it hard to optimise the code so that the computation does not last forever. Below you'll find the code that essentially recreates the optimisation procedure on some dummy data. Could you please point me into any direction which could possibly improve performance?

y_pred = np.random.rand(400000)
y_true = np.random.randint(2, size=400000)
accs = [(accuracy_score(y_true, y_pred > t), t) for t in np.unique(y_pred)]
train_acc, train_thresh = max(accs, key=lambda pair: pair[0])

I realize that I could sort both y_pred and y_true prior to the loop, and use that to my advantage when binarizing y_pred but that didn't bring much improvement (unless I did something wrong).

Any help would be much appreciated.

Upvotes: 2

Views: 2987

Answers (2)

Alexander Yalunin
Alexander Yalunin

Reputation: 419

I wrote a helper function in python:

def opt_threshold_acc(y_true, y_pred):
    A = list(zip(y_true, y_pred))
    A = sorted(A, key=lambda x: x[1])
    total = len(A)
    tp = len([1 for x in A if x[0]==1])
    tn = 0
    th_acc = []
    for x in A:
        th = x[1]
        if x[0] == 1:
            tp -= 1
        else:
            tn += 1
        acc = (tp + tn) / total
        th_acc.append((th, acc))
    return max(th_acc, key=lambda x: x[1])

Upvotes: 0

Diego
Diego

Reputation: 18379

Sort y_pred descendantly and use Kadane's Algorithm to calculate an index i such that the subarray of y_true from 0 to i has maximum sum. Your optimal threshold b is then b = (y_pred[i] + y_pred[i+i]) / 2. This will be the same output that SVM would give you, that is, the hyperplane (or for your 1-dimensional case, a threshold) that maximizes the margin between classes.

Upvotes: 2

Related Questions