Reputation: 747
My output of neural network is table of predicted class probabilities for multi-label classification:
print(probabilities)
| | 1 | 3 | ... | 8354 | 8356 | 8357 |
|---|--------------|--------------|-----|--------------|--------------|--------------|
| 0 | 2.442745e-05 | 5.952136e-06 | ... | 4.254002e-06 | 1.894523e-05 | 1.033957e-05 |
| 1 | 7.685694e-05 | 3.252202e-06 | ... | 3.617730e-06 | 1.613792e-05 | 7.356643e-06 |
| 2 | 2.296657e-06 | 4.859554e-06 | ... | 9.934525e-06 | 9.244772e-06 | 1.377618e-05 |
| 3 | 5.163169e-04 | 1.044035e-04 | ... | 1.435158e-04 | 2.807420e-04 | 2.346930e-04 |
| 4 | 2.484626e-06 | 2.074290e-06 | ... | 9.958628e-06 | 6.002510e-06 | 8.434519e-06 |
| 5 | 1.297477e-03 | 2.211737e-04 | ... | 1.881772e-04 | 3.171079e-04 | 3.228884e-04 |
I converted it to class labels using a threshold (0.2) for measuring accuraccy of my prediction:
predictions = (probabilities > 0.2).astype(np.int)
print(predictions)
| | 1 | 3 | ... | 8354 | 8356 | 8357 |
|---|---|---|-----|------|------|------|
| 0 | 0 | 0 | ... | 0 | 0 | 0 |
| 1 | 0 | 0 | ... | 0 | 0 | 0 |
| 2 | 0 | 0 | ... | 0 | 0 | 0 |
| 3 | 0 | 0 | ... | 0 | 0 | 0 |
| 4 | 0 | 0 | ... | 0 | 0 | 0 |
| 5 | 0 | 0 | ... | 0 | 0 | 0 |
Also I have a test set:
print(Y_test)
| | 1 | 3 | ... | 8354 | 8356 | 8357 |
|---|---|---|-----|------|------|------|
| 0 | 0 | 0 | ... | 0 | 0 | 0 |
| 1 | 0 | 0 | ... | 0 | 0 | 0 |
| 2 | 0 | 0 | ... | 0 | 0 | 0 |
| 3 | 0 | 0 | ... | 0 | 0 | 0 |
| 4 | 0 | 0 | ... | 0 | 0 | 0 |
| 5 | 0 | 0 | ... | 0 | 0 | 0 |
Question: How to build an algorithm in Python that will choose the optimal threshold that maximize roc_auc_score(average = 'micro')
or another metrics?
Maybe it is possible to build manual function in Python that optimize threshold, depending on the accuracy metric.
Upvotes: 5
Views: 10055
Reputation: 2723
I assume your groundtruth labels are Y_test
and predictions are predictions
.
Optimizing roc_auc_score(average = 'micro')
according to a prediction threshold
does not seem to make sense as AUCs are computed based on how predictions are ranked and therefore need predictions
as float values in [0,1]
.
Therefore, I will discuss accuracy_score
.
You could use scipy.optimize.fmin
:
import scipy
from sklearn.metrics import accuracy_score
def thr_to_accuracy(thr, Y_test, predictions):
return -accuracy_score(Y_test, np.array(predictions>thr, dtype=np.int))
best_thr = scipy.optimize.fmin(thr_to_accuracy, args=(Y_test, predictions), x0=0.5)
Upvotes: 5
Reputation: 1074
According to @cangrejo's answer: https://stats.stackexchange.com/a/310956/194535, suppose the original output probability of your model is the vector v, and then you can define the prior distribution:
π=(1/θ1, 1/θ2,..., 1/θN), for θi∈(0,1) and Σθi = 1, where N is the total number of labeled classes, i is the class index.
Take v' = v⊙π as the new output probability of your model, where ⊙ denotes an element-wise product.
Now, your question can be reformulate to this: Finding the π which optimize the metrics you have specified (eg. roc_auc_score) from the new output probability model. Once you find it, the θs (θ1, θ2, ..., θN) is your optimal threshold for each classes.
The Code part:
Create a proxyModel
class which takes your original model object as an argument and return a proxyModel
object. When you called predict_proba()
through the proxyModel
object, it will calculate new probability automatically based on the threshold you specified:
class proxyModel():
def __init__(self, origin_model):
self.origin_model = origin_model
def predict_proba(self, x, threshold_list=None):
# get origin probability
ori_proba = self.origin_model.predict_proba(x)
# set default threshold
if threshold_list is None:
threshold_list = np.full(ori_proba[0].shape, 1)
# get the output shape of threshold_list
output_shape = np.array(threshold_list).shape
# element-wise divide by the threshold of each classes
new_proba = np.divide(ori_proba, threshold_list)
# calculate the norm (sum of new probability of each classes)
norm = np.linalg.norm(new_proba, ord=1, axis=1)
# reshape the norm
norm = np.broadcast_to(np.array([norm]).T, (norm.shape[0],output_shape[0]))
# renormalize the new probability
new_proba = np.divide(new_proba, norm)
return new_proba
def predict(self, x, threshold_list=None):
return np.argmax(self.predict_proba(x, threshold_list), axis=1)
Implement a score function:
def scoreFunc(model, X, y_true, threshold_list):
y_pred = model.predict(X, threshold_list=threshold_list)
y_pred_proba = model.predict_proba(X, threshold_list=threshold_list)
###### metrics ######
from sklearn.metrics import accuracy_score
from sklearn.metrics import roc_auc_score
from sklearn.metrics import average_precision_score
from sklearn.metrics import f1_score
accuracy = accuracy_score(y_true, y_pred)
roc_auc = roc_auc_score(y_true, y_pred_proba, average='macro')
pr_auc = average_precision_score(y_true, y_pred_proba, average='macro')
f1_value = f1_score(y_true, y_pred, average='macro')
return accuracy, roc_auc, pr_auc, f1_value
Define weighted_score_with_threshold()
function, which takes the threshold as input and return weighted score:
def weighted_score_with_threshold(threshold, model, X_test, Y_test, metrics='accuracy', delta=5e-5):
# if the sum of thresholds were not between 1+delta and 1-delta,
# return infinity (just for reduce the search space of the minimizaiton algorithm,
# because the sum of thresholds should be as close to 1 as possible).
threshold_sum = np.sum(threshold)
if threshold_sum > 1+delta:
return np.inf
if threshold_sum < 1-delta:
return np.inf
# to avoid objective function jump into nan solution
if np.isnan(threshold_sum):
print("threshold_sum is nan")
return np.inf
# renormalize: the sum of threshold should be 1
normalized_threshold = threshold/threshold_sum
# calculate scores based on thresholds
# suppose it'll return 4 scores in a tuple: (accuracy, roc_auc, pr_auc, f1)
scores = scoreFunc(model, X_test, Y_test, threshold_list=normalized_threshold)
scores = np.array(scores)
weight = np.array([1,1,1,1])
# Give the metric you want to maximize a bigger weight:
if metrics == 'accuracy':
weight = np.array([10,1,1,1])
elif metrics == 'roc_auc':
weight = np.array([1,10,1,1])
elif metrics == 'pr_auc':
weight = np.array([1,1,10,1])
elif metrics == 'f1':
weight = np.array([1,1,1,10])
elif 'all':
weight = np.array([1,1,1,1])
# return negatitive weighted sum (because you want to maximize the sum,
# it's equivalent to minimize the negative sum)
return -np.dot(weight, scores)
Use optimize algorithm differential_evolution()
(better then fmin) to find the optimal threshold:
from scipy import optimize
output_class_num = Y_test.shape[1]
bounds = optimize.Bounds([1e-5]*output_class_num,[1]*output_class_num)
pmodel = proxyModel(model)
result = optimize.differential_evolution(weighted_score_with_threshold, bounds, args=(pmodel, X_test, Y_test, 'accuracy'))
# calculate threshold
threshold = result.x/np.sum(result.x)
# print the optimized score
print(scoreFunc(model, X_test, Y_test, threshold_list=threshold))
Upvotes: 5
Reputation: 2161
the best way to do so is to put a logistic regression on top of your new dataset. It will multiply every probability by a certain constant and thus will provide an automatic threshold on the output (with the LR you just need to predict the class not the probabilities)
You need to train this by subdividing the Test set in two and use one part to train the LR after predicting the output with the NN.
This is not the only way to do it, but it works fine for me everytime.
we have X_train_nn,X_valid_nn,X_test_NN and we subdivide X_test_NN in X_train_LR, X_test_LR (or do a Stratified Kfold as you wish) here is a sample of the code
X_train = NN.predict_proba(X_train_LR)
X_test = NN.predict_proba(X_test_LR)
logistic = linear_model.LogisticRegression(C=1.0, penalty = 'l2')
logistic.fit(X_train,Y_train)
logistic.score(X_test,Y_test)
You condider you output as a new dataset and train a LR on this new dataset.
Upvotes: 1