Reputation: 1193
I am building a Random Forest model using a grid search with the H2O Python API. I split the data in train and validation and use k-fold cross validation to select the best model in the grid search.
I am able to retrieve the model with the best MSE
on the training set but I want to retrieve the model with the highest AUC
on the validation set.
I could code everything in Python but I was wondering whether there is a H2O approach to solve this. Any suggestions on how I could do this?
Upvotes: 2
Views: 847
Reputation: 28928
If g
is your grid object, then:
g.sort_by('auc', False);
will give you the models ordered by AUC. The 2nd parameter of False means highest AUC will be first. It returns a H2OTwoDimTable
object, so you can select the first model (the best model, by AUC) that way.
I believe it should be sorting based on scores on the validation set, not training set. However you can specify it explicitly with:
g.sort_by('auc(valid=True)', False);
Upvotes: 3