How get the best threshold for classification using H2o Python

Question

I have a classification model using H2o in Python for which the AUC = 71%

But the accuracy based on confusion Matrix is only 61%. I Understand that confusion matrix is based on .5 threshold

How do I determine for which threshold the accuracy will be 71%?

Neema Mashayekhi · Accepted Answer

AUC of the ROC curve is not accuracy, and the value is threshold independent. It is a measure of how well separated two classes are. The 71% value tells you the probability of you randomly sampling positive class having a higher predicted probability than a randomly sampled negative class. See this explanation.

Selecting the threshold should depend on your cost matrix (how much the penalty is for False Positives or False Negatives). You would want to select the threshold that maximize your desired metric (max. F1, precision, accuracy). H2O gives multiple options. In H2O, if you call the model performance (Python ex: your_model.model_performance()), you will get the threshold for max accuracy and other optimized metrics listed.

How get the best threshold for classification using H2o Python

Answers (1)

Related Questions