Reputation: 645
I have a classification model using H2o in Python for which the AUC = 71%
But the accuracy based on confusion Matrix is only 61%. I Understand that confusion matrix is based on .5 threshold
How do I determine for which threshold the accuracy will be 71%?
Upvotes: 0
Views: 1106
Reputation: 930
AUC of the ROC curve is not accuracy, and the value is threshold independent. It is a measure of how well separated two classes are. The 71% value tells you the probability of you randomly sampling positive class having a higher predicted probability than a randomly sampled negative class. See this explanation.
Selecting the threshold should depend on your cost matrix (how much the penalty is for False Positives or False Negatives). You would want to select the threshold that maximize your desired metric (max. F1, precision, accuracy). H2O gives multiple options. In H2O, if you call the model performance (Python ex: your_model.model_performance()
), you will get the threshold for max accuracy
and other optimized metrics listed.
Upvotes: 1