Reputation: 13
What is the difference between these two arguments nfolds
, and train_samples_per_iteration
, and is one of the more important to determining the optimum hyperparameters than the other?
Also, is it necessary to scale the training and testing sets before training the model?
Would it be important to transfer the response variable to a factor
form?
Upvotes: 0
Views: 50
Reputation: 28968
nfolds
is specified when you want to do cross-validation. If you are not doing cross-validation and instead you are doing a train/valid/test data split, then you can ignore it.
train_samples_per_iteration
decides how often scoring is done. The default is to let H2O decide, which is normally a good idea. Only touch it if you feel like a significant portion of training time is being wasted on scoring the model too frequently, or at the other extreme, that it is not scoring often enough (and missing chances to do early stopping).
Also, is it necessary to scale the training and testing sets before training the model?
No, H2O will do this by default.
Would it be important to transfer the response variable to a factor form?
Yes. If the response variable is one of a set of categories, make sure H2O has recognized it as a factor. If it recognizes it as a numerical type it will build a regression model instead.
(It normally does the right thing automatically, but it can miss your intention if your categories are numbers, e.g. "0" for no, "1" for yes.)
Upvotes: 1