AdrianD
AdrianD

Reputation: 51

Save and load all h2o cross-validation models in R

I am trying to find an easy way to save all the cross validation models produced by h2o using R.

Running a any kind of model with nfolds = 5 I can see each CV model listed in the web-interface (localhose:54321) looking something like this:

model_id
model_id_cv_1
model_id_cv_2
model_id_cv_3
model_id_cv_4
model_id_cv_5

I've used this to save it: h2o.saveModel(model_id, path="mypath") gives a

But h2o.saveModel(model_id_cv_1, path="mypath") But when I reload it I loose all the cross validated models.

It does seem possible to save each CV model as POJO via the webinterface, but I'd rather be able to do this programatically in R. It seems that there used to be a 'save_cv' option in earlier versions of h2o.saveModel(), but this seems to have been removed.

Is this possible?

Upvotes: 2

Views: 804

Answers (1)

AvkashChauhan
AvkashChauhan

Reputation: 20566

When you save a model by main model ID which has CV configuration, the saved model does have all the cross validated models into it. If you save individual cross validated models on disk then they will all be considered as individual models and you will not see them all together.

Here is an example:

Lets build GBM model with 5 folds:

prostate_df  = h2o.importFile("https://raw.githubusercontent.com/Avkash/mldl/master/data/prostate.csv")
response = "CAPSULE"
features = setdiff(h2o.colnames(prostate_df), response)
prostate_gbm_cv5_model = h2o.gbm(x = features, y = response, training_frame = prostate_df, nfolds = 5)

You can get all the models from this object:

h2o.cross_validation_models(prostate_gbm_cv5_model)

You can access individual CV models as below:

h2o.cross_validation_models(prostate_gbm_cv5_model][[1]]
h2o.cross_validation_models(prostate_gbm_cv5_model)[[1]]@model_id

You will get total cross fold models count here:

length(h2o.cross_validation_models(prostate_gbm_cv5_model))

Lets save model to disk:

h2o.saveModel(object = prostate_gbm_cv5_model, path = "/Users/avkashchauhan/Downloads")

Lets load model from the disk:

model_from_disk = h2o.loadModel("/Users/avkashchauhan/Downloads/GBM_model_R_1512067532473_2966")

You will get all the CV models here:

h2o.cross_validation_models(model_from_disk)

Get CV models count:

length(h2o.cross_validation_models(model_from_disk))

Access CV model individually:

h2o.cross_validation_models(model_from_disk)[[1]]
h2o.cross_validation_models(model_from_disk)[[1]]@model_id

Upvotes: 1

Related Questions