Reputation: 51
I am trying to find an easy way to save all the cross validation models produced by h2o using R.
Running a any kind of model with nfolds = 5 I can see each CV model listed in the web-interface (localhose:54321) looking something like this:
model_id
model_id_cv_1
model_id_cv_2
model_id_cv_3
model_id_cv_4
model_id_cv_5
I've used this to save it: h2o.saveModel(model_id, path="mypath") gives a
But h2o.saveModel(model_id_cv_1, path="mypath") But when I reload it I loose all the cross validated models.
It does seem possible to save each CV model as POJO via the webinterface, but I'd rather be able to do this programatically in R. It seems that there used to be a 'save_cv' option in earlier versions of h2o.saveModel(), but this seems to have been removed.
Is this possible?
Upvotes: 2
Views: 804
Reputation: 20566
When you save a model by main model ID which has CV configuration, the saved model does have all the cross validated models into it. If you save individual cross validated models on disk then they will all be considered as individual models and you will not see them all together.
Here is an example:
Lets build GBM model with 5 folds:
prostate_df = h2o.importFile("https://raw.githubusercontent.com/Avkash/mldl/master/data/prostate.csv")
response = "CAPSULE"
features = setdiff(h2o.colnames(prostate_df), response)
prostate_gbm_cv5_model = h2o.gbm(x = features, y = response, training_frame = prostate_df, nfolds = 5)
You can get all the models from this object:
h2o.cross_validation_models(prostate_gbm_cv5_model)
You can access individual CV models as below:
h2o.cross_validation_models(prostate_gbm_cv5_model][[1]]
h2o.cross_validation_models(prostate_gbm_cv5_model)[[1]]@model_id
You will get total cross fold models count here:
length(h2o.cross_validation_models(prostate_gbm_cv5_model))
Lets save model to disk:
h2o.saveModel(object = prostate_gbm_cv5_model, path = "/Users/avkashchauhan/Downloads")
Lets load model from the disk:
model_from_disk = h2o.loadModel("/Users/avkashchauhan/Downloads/GBM_model_R_1512067532473_2966")
You will get all the CV models here:
h2o.cross_validation_models(model_from_disk)
Get CV models count:
length(h2o.cross_validation_models(model_from_disk))
Access CV model individually:
h2o.cross_validation_models(model_from_disk)[[1]]
h2o.cross_validation_models(model_from_disk)[[1]]@model_id
Upvotes: 1