Gavin
Gavin

Reputation: 1521

how to save the threshold from H2O model_performance function in Python?

I am running H2O in Python and build GBM model for binary target variable (1 vs 0). The model perform well and I can see the threshold in the output. But I want to save the threshold to a variable (we can call it cut_point). So when I score a new data set, I can use the threshold to define either 1 or 0. Has anyone has done this before?

enter image description here

Upvotes: 0

Views: 714

Answers (2)

Anastasiya-Romanova 秀
Anastasiya-Romanova 秀

Reputation: 3378

Alternatively, for finding thresholds that maximize F1-scores, one can use:

model.F1(train=True, valid=True, xval=False)

The sample output of the line above:

{u'train': [[0.3869697386893616, 0.7451099672437997]], u'valid': [[0.35417599264806404, 0.7228980805623143]]}

The threshold value that maximizes the F1-score for each data set is the first value (index 0) of the list in each key. The second value (index 1) is the maximum of F1-score for each data set. To index a threshold value of, say the validity frame one can use:

values = model.F1(train=True, valid=True, xval=False)
values.get('valid')[0]

This method also works for the following metrics:

Upvotes: 0

Lauren
Lauren

Reputation: 5778

You can use find_threshold_by_max_metric

model.find_threshold_by_max_metric('f1', train=True, valid=False, xval=False)

Upvotes: 2

Related Questions