user1342124
user1342124

Reputation: 651

H2o: Is there a way to fix threshold in H2ORandomForestEstimator performance during training and testing?

I have built a model with H2ORandomForestEstimator and the results shows something like this below.

The threshold keeps changing (0.5 from traning and 0.313725489027 from validation) and I like to fix the threshold in H2ORandomForestEstimator for comparison during fine tuning. Is there a way to set the threshold?

From http://h2o-release.s3.amazonaws.com/h2o/master/3484/docs-website/h2o-py/docs/modeling.html#h2orandomforestestimator, there is no such parameter.

If there is no way to set this, how do we know what threshold our model is built on?

rf_v1
** Reported on train data. **

MSE:    2.75013548238e-05  
RMSE:   0.00524417341664  
LogLoss:0.000494320913199  
Mean Per-Class Error: 0.0188802936476  
AUC: 0.974221763605  
Gini: 0.948443527211  
Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.5:
       0       1    Error    Rate
-----  ------  ---  -------  --------------  
0      161692  1    0        (1.0/161693.0)  
1      3       50   0.0566   (3.0/53.0)  
Total  161695 51   0        (4.0/161746.0)  
Maximum Metrics: Maximum metrics at their respective thresholds

metric                       threshold    value     idx
---------------------------  -----------  --------  -----  
max f1                       0.5          0.961538  19  
max f2                       0.25         0.955056  21  
max f0point5                 0.571429     0.983936  18  
max accuracy                 0.571429     0.999975  18  
max precision                1            1         0  
max recall                   0            1         69  
max specificity              1            1         0  
max absolute_mcc             0.5          0.961704  19  
max min_per_class_accuracy   0.25         0.962264  21  
max mean_per_class_accuracy  0.25         0.98112   21  
Gains/Lift Table: Avg response rate:  0.03 %

** Reported on validation data. **

MSE:      1.00535766226e-05  
RMSE:     0.00317073755183  
LogLoss:  4.53885183426e-05  
Mean Per-Class Error: 0.0  
AUC: 1.0  
Gini: 1.0  
Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.313725489027:
       0      1    Error    Rate
-----  -----  ---  -------  -------------  
0      53715  0    0        (0.0/53715.0)  
1      0      16   0        (0.0/16.0)  
Total  53715  16   0        (0.0/53731.0)  
Maximum Metrics: Maximum metrics at their respective thresholds

metric                       threshold    value    idx
---------------------------  -----------  -------  -----  
max f1                       0.313725     1        5  
max f2                       0.313725     1        5  
max f0point5                 0.313725     1        5  
max accuracy                 0.313725     1        5  
max precision                1            1        0  
max recall                   0.313725     1        5  
max specificity              1            1        0  
max absolute_mcc             0.313725     1        5  
max min_per_class_accuracy   0.313725     1        5  
max mean_per_class_accuracy  0.313725     1        5

Upvotes: 1

Views: 1274

Answers (1)

TomKraljevic
TomKraljevic

Reputation: 3671

The threshold is max-F1.

If you want to apply your own threshold, you will have to take the probability of the positive class and compare it yourself to produce the label you want.

If you use your web browser to connect to the H2O Flow Web UI inside of H2O-3, you can mouse over the ROC curve and visually browse the confusion matrix for each threshold, which is convenient.

Upvotes: 2

Related Questions