Reputation: 2022
AutoML makes two learners, one that includes "all" and the other that is a subset that is "best of family".
Is there any way to not-manually save the components and stacked ensemble aggregator to disk so that that "best of family", treated as a standalone black-box, can be stored, reloaded, and used without requiring literally 1000 less valuable learners to exist in the same space?
If so, how do I do that?
Upvotes: 1
Views: 632
Reputation: 5778
While running AutoML everything runs in memory (nothing is saved to disk unless you save one of the models to disk - or apply the option of saving an object to disk).
If you just want the "Best of Family" stacked ensemble, all you have to do is save that binary model. When you save a stacked ensemble, it saves all the required pieces (base models and meta model) for you. Then you can re-load later for use with another H2O cluster when you're ready to make predictions (just make sure, if you are saving a binary model, that you can use the same version of H2O later on).
Python Example:
bestoffamily = h2o.get_model('StackedEnsemble_BestOfFamily_0_AutoML_20171121_012135')
h2o.save_model(bestoffamily, path = "/home/users/me/mymodel")
R Example:
bestoffamily <- h2o.getModel('StackedEnsemble_BestOfFamily_0_AutoML_20171121_012135')
h2o.saveModel(bestoffamily, path = "/home/users/me/mymodel")
Later on, you re-load the stacked ensemble into memory using h2o.load_model()
in Python or h2o.loadModel()
in R.
Alternatively, instead of using an H2O binary model, which requires an H2O cluster to be running at prediction time, you can use a MOJO model (different model format). It's a bit more work to use MOJOs, though they are faster and designed for production use. If you want to save a MOJO model instead, then you can use h2o.save_mojo()
in Python or h2o.saveMojo()
in R.
Upvotes: 4