Andrea Mariani
Andrea Mariani

Reputation: 41

h2o binary classification, understand p0 and p1

I have already read this question: How should we interpret the results of the H2O predict function? Still don't understand if p1 is the probability between [0,1] and could be used equally as it 's a regression and i can apply my own threshold

edit: thank you for your answer still have some confusion about it, let's dig it suppose my outcome Y is [0,1], if Y is numeric i run it as REGRESSION and i have a single column as response. On the other hand if Y is factor run it as CLASSIFICATION and the output is: prediction/p0/p1. NOW, is p1 the same as use Y as numeric? Also http://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/algo-params/calibrate_model.html calibrate_model parameter affects logloss but now the max F1 is still used as threshold on P0 P1 or on the calibrated probabilities? Can i use the calibrated probabilities for regression as the logloss is supposed less?

Upvotes: 3

Views: 2378

Answers (1)

Lauren
Lauren

Reputation: 5778

the output of a binary classification problem for H2O will give you the class label (where the threshold is set to get you the max F1 score), the predicted value of class 0 (p0), and the predicted value of class 1 (p1).

These predicted values are uncalibrated probabilities, if you want actual probabilities you need to set H2O's model argument calibrate_model to True.

So to answer your question, yes p1 is the predicted value between 0 and 1 (for example you will see values like .23, .45. , .89, etc.) and because H2O builds regression trees you could technically use 1-p0 to get your p1 value (or vice versa) and in fact unless you set binomial_double_trees = True this is exactly what H2O is doing: it builds a single regression tree for one of the classes and then takes 1-(that class value) to get the predicted values for the other class.

Upvotes: 3

Related Questions