Reputation: 2221
According to the H2O documentation, the threshold used at prediction is the max F1 threshold from train. The performance function,
h2o.performance(model, newdata = test)
actually run the prediction on the test set in order to compute the confusion matrix.
Strangely I am getting different confusion matrix while predicting the same test set using :
h2o.predict(object, newdata=test).
It means that h2o.performance()
is using a different threshold from h2o.predict()
.
I am wondering how can i dictate the threshold upon prediction.
Upvotes: 2
Views: 381
Reputation: 930
H2O is using max F1 threshold for both h2o.performance() and h2o.predict(). The difference is what dataset it will use to estimate the max F1 threshold.
h2o.predict() will use the threshold it selected during training. It uses different max F1 thresholds depending on how the model was trained. Basically:
This is explained in the documentation and also on stackoverflow. Depending on if you had validation data during training, you will see the max F1 threshold to be determined by your training or validation dataset.
h2o.performance() will take the model and newdata and calculate what threshold will give the highest F1 for the new data. In your case, test is being used to calculate max F1 threshold.
Upvotes: 2