Reputation: 3437
After using autoML to generate aml leaderboard, I ran
h2o.predict(aml@leader, test_df)
but how can I know which model on the leaderboard it is using? And if I want to access the structure or hyperparameter of any model on leaderboard how can I do so?
Besides the result on test set is not nearly as good as the one on validation set, is it common - did I use it wrongly or does it has a tendency to overfit?
Also want to understand its infrastructure better, after h2o.init does the data transmit to a server in h2o.ai's clusters or do everything happen on my local laptop?
Thanks.
Upvotes: 0
Views: 862
Reputation: 8819
It's using the "leader" model, which is the #1 model on the leaderboard, ranked by a default metric for the ML task (binary classification, multiclass classification, regression). The leader model ID is here: aml@leader@model_id
.
The leader model, stored at aml@leader
, is just a regular H2O model, so if you want to look at the parameters used, look at aml@leader@parameters
for the parameters that you set, or aml@leader@allparameters
for all the parameter values (including the ones that you did not set manually).
The validation_frame
is used to tune the individual models via early stopping, so the validation error will always be overly-optimistic compared to the test error, which will be a good estimate of the generalization error.
The third question is out of scope for this post, but I'll answer it anyway. When you using H2O and start the cluster using h2o.init()
you are running everything locally on your laptop. If you start an H2O Cluster somewhere else, such as Amazon EC2 or your own remote servers, you can pass the IP address of that server to the h2o.init()
command using the ip
argument to connect to it and the computations will be run on that remote machine. Either way, the servers are entirely under your control -- there is no "H2O Cloud" owned by H2O.ai that does remote processing.
Upvotes: 3