lemon
lemon

Reputation: 747

How to choose optimal threshold for class probabilities?

My output of neural network is table of predicted class probabilities for multi-label classification:

print(probabilities)

|   |      1       |      3       | ... |     8354     |     8356     |     8357     |
|---|--------------|--------------|-----|--------------|--------------|--------------|
| 0 | 2.442745e-05 | 5.952136e-06 | ... | 4.254002e-06 | 1.894523e-05 | 1.033957e-05 |
| 1 | 7.685694e-05 | 3.252202e-06 | ... | 3.617730e-06 | 1.613792e-05 | 7.356643e-06 |
| 2 | 2.296657e-06 | 4.859554e-06 | ... | 9.934525e-06 | 9.244772e-06 | 1.377618e-05 |
| 3 | 5.163169e-04 | 1.044035e-04 | ... | 1.435158e-04 | 2.807420e-04 | 2.346930e-04 |
| 4 | 2.484626e-06 | 2.074290e-06 | ... | 9.958628e-06 | 6.002510e-06 | 8.434519e-06 |
| 5 | 1.297477e-03 | 2.211737e-04 | ... | 1.881772e-04 | 3.171079e-04 | 3.228884e-04 |

I converted it to class labels using a threshold (0.2) for measuring accuraccy of my prediction:

predictions = (probabilities > 0.2).astype(np.int)
print(predictions)

|   | 1 | 3 | ... | 8354 | 8356 | 8357 |
|---|---|---|-----|------|------|------|
| 0 | 0 | 0 | ... |    0 |    0 |    0 |
| 1 | 0 | 0 | ... |    0 |    0 |    0 |
| 2 | 0 | 0 | ... |    0 |    0 |    0 |
| 3 | 0 | 0 | ... |    0 |    0 |    0 |
| 4 | 0 | 0 | ... |    0 |    0 |    0 |
| 5 | 0 | 0 | ... |    0 |    0 |    0 |

Also I have a test set:

print(Y_test)

|   | 1 | 3 | ... | 8354 | 8356 | 8357 |
|---|---|---|-----|------|------|------|
| 0 | 0 | 0 | ... |    0 |    0 |    0 |
| 1 | 0 | 0 | ... |    0 |    0 |    0 |
| 2 | 0 | 0 | ... |    0 |    0 |    0 |
| 3 | 0 | 0 | ... |    0 |    0 |    0 |
| 4 | 0 | 0 | ... |    0 |    0 |    0 |
| 5 | 0 | 0 | ... |    0 |    0 |    0 |

Question: How to build an algorithm in Python that will choose the optimal threshold that maximize roc_auc_score(average = 'micro') or another metrics?

Maybe it is possible to build manual function in Python that optimize threshold, depending on the accuracy metric.

Upvotes: 5

Views: 10055

Answers (3)

syltruong
syltruong

Reputation: 2723

I assume your groundtruth labels are Y_test and predictions are predictions.

Optimizing roc_auc_score(average = 'micro') according to a prediction threshold does not seem to make sense as AUCs are computed based on how predictions are ranked and therefore need predictions as float values in [0,1].

Therefore, I will discuss accuracy_score.

You could use scipy.optimize.fmin:

import scipy
from sklearn.metrics import accuracy_score

def thr_to_accuracy(thr, Y_test, predictions):
   return -accuracy_score(Y_test, np.array(predictions>thr, dtype=np.int))

best_thr = scipy.optimize.fmin(thr_to_accuracy, args=(Y_test, predictions), x0=0.5)

Upvotes: 5

allenyllee
allenyllee

Reputation: 1074

According to @cangrejo's answer: https://stats.stackexchange.com/a/310956/194535, suppose the original output probability of your model is the vector v, and then you can define the prior distribution:

π=(1/θ1, 1/θ2,..., 1/θN), for θi∈(0,1) and Σθi = 1, where N is the total number of labeled classes, i is the class index.

Take v' = v⊙π as the new output probability of your model, where ⊙ denotes an element-wise product.

Now, your question can be reformulate to this: Finding the π which optimize the metrics you have specified (eg. roc_auc_score) from the new output probability model. Once you find it, the θs (θ1, θ2, ..., θN) is your optimal threshold for each classes.

The Code part:


  1. Create a proxyModel class which takes your original model object as an argument and return a proxyModel object. When you called predict_proba() through the proxyModel object, it will calculate new probability automatically based on the threshold you specified:

    class proxyModel():
        def __init__(self, origin_model):
            self.origin_model = origin_model
    
        def predict_proba(self, x, threshold_list=None):
            # get origin probability
            ori_proba = self.origin_model.predict_proba(x)
    
            # set default threshold
            if threshold_list is None:
                threshold_list = np.full(ori_proba[0].shape, 1)
    
            # get the output shape of threshold_list
            output_shape = np.array(threshold_list).shape
    
            # element-wise divide by the threshold of each classes
            new_proba = np.divide(ori_proba, threshold_list)
    
            # calculate the norm (sum of new probability of each classes)
            norm = np.linalg.norm(new_proba, ord=1, axis=1)
    
            # reshape the norm
            norm = np.broadcast_to(np.array([norm]).T, (norm.shape[0],output_shape[0]))
    
            # renormalize the new probability
            new_proba = np.divide(new_proba, norm)
    
            return new_proba
    
        def predict(self, x, threshold_list=None):
            return np.argmax(self.predict_proba(x, threshold_list), axis=1)
    
  2. Implement a score function:

    def scoreFunc(model, X, y_true, threshold_list):
        y_pred = model.predict(X, threshold_list=threshold_list)
        y_pred_proba = model.predict_proba(X, threshold_list=threshold_list)
    
        ###### metrics ######
        from sklearn.metrics import accuracy_score
        from sklearn.metrics import roc_auc_score
        from sklearn.metrics import average_precision_score
        from sklearn.metrics import f1_score
    
        accuracy = accuracy_score(y_true, y_pred)
        roc_auc = roc_auc_score(y_true, y_pred_proba, average='macro')
        pr_auc = average_precision_score(y_true, y_pred_proba, average='macro')
        f1_value = f1_score(y_true, y_pred, average='macro')
    
        return accuracy, roc_auc, pr_auc, f1_value
    
    
  3. Define weighted_score_with_threshold() function, which takes the threshold as input and return weighted score:

    def weighted_score_with_threshold(threshold, model, X_test, Y_test, metrics='accuracy', delta=5e-5):
        # if the sum of thresholds were not between 1+delta and 1-delta, 
        # return infinity (just for reduce the search space of the minimizaiton algorithm, 
        # because the sum of thresholds should be as close to 1 as possible).
        threshold_sum = np.sum(threshold)
    
        if threshold_sum > 1+delta:
            return np.inf
    
        if threshold_sum < 1-delta:
            return np.inf
    
        # to avoid objective function jump into nan solution
        if np.isnan(threshold_sum):
            print("threshold_sum is nan")
            return np.inf
    
        # renormalize: the sum of threshold should be 1
        normalized_threshold = threshold/threshold_sum
    
        # calculate scores based on thresholds
        # suppose it'll return 4 scores in a tuple: (accuracy, roc_auc, pr_auc, f1)
        scores = scoreFunc(model, X_test, Y_test, threshold_list=normalized_threshold)    
    
        scores = np.array(scores)
        weight = np.array([1,1,1,1])
    
        # Give the metric you want to maximize a bigger weight:
        if metrics == 'accuracy':
            weight = np.array([10,1,1,1])
        elif metrics == 'roc_auc':
            weight = np.array([1,10,1,1])
        elif metrics == 'pr_auc':
            weight = np.array([1,1,10,1])
        elif metrics == 'f1':
            weight = np.array([1,1,1,10])
        elif 'all':
            weight = np.array([1,1,1,1])
    
        # return negatitive weighted sum (because you want to maximize the sum, 
        # it's equivalent to minimize the negative sum)
        return -np.dot(weight, scores)
    
  4. Use optimize algorithm differential_evolution() (better then fmin) to find the optimal threshold:

    from scipy import optimize
    
    output_class_num = Y_test.shape[1]
    bounds = optimize.Bounds([1e-5]*output_class_num,[1]*output_class_num)
    
    pmodel = proxyModel(model)
    
    result = optimize.differential_evolution(weighted_score_with_threshold, bounds, args=(pmodel, X_test, Y_test, 'accuracy'))
    
    # calculate threshold
    threshold = result.x/np.sum(result.x)
    
    # print the optimized score
    print(scoreFunc(model, X_test, Y_test, threshold_list=threshold))
    
    

Upvotes: 5

Frayal
Frayal

Reputation: 2161

the best way to do so is to put a logistic regression on top of your new dataset. It will multiply every probability by a certain constant and thus will provide an automatic threshold on the output (with the LR you just need to predict the class not the probabilities)

You need to train this by subdividing the Test set in two and use one part to train the LR after predicting the output with the NN.

This is not the only way to do it, but it works fine for me everytime.

we have X_train_nn,X_valid_nn,X_test_NN and we subdivide X_test_NN in X_train_LR, X_test_LR (or do a Stratified Kfold as you wish) here is a sample of the code

X_train = NN.predict_proba(X_train_LR)
X_test = NN.predict_proba(X_test_LR)
logistic = linear_model.LogisticRegression(C=1.0, penalty = 'l2')
logistic.fit(X_train,Y_train)
logistic.score(X_test,Y_test)

You condider you output as a new dataset and train a LR on this new dataset.

Upvotes: 1

Related Questions