Reputation: 1521
I am running H2O in Python and build GBM model for binary target variable (1 vs 0). The model perform well and I can see the threshold in the output. But I want to save the threshold to a variable (we can call it cut_point). So when I score a new data set, I can use the threshold to define either 1 or 0. Has anyone has done this before?
Upvotes: 0
Views: 714
Reputation: 3378
Alternatively, for finding thresholds that maximize F1-scores, one can use:
model.F1(train=True, valid=True, xval=False)
The sample output of the line above:
{u'train': [[0.3869697386893616, 0.7451099672437997]], u'valid': [[0.35417599264806404, 0.7228980805623143]]}
The threshold value that maximizes the F1-score for each data set is the first value (index 0) of the list in each key. The second value (index 1) is the maximum of F1-score for each data set. To index a threshold value of, say the validity frame one can use:
values = model.F1(train=True, valid=True, xval=False)
values.get('valid')[0]
This method also works for the following metrics:
Upvotes: 0
Reputation: 5778
You can use find_threshold_by_max_metric
model.find_threshold_by_max_metric('f1', train=True, valid=False, xval=False)
Upvotes: 2