Reputation: 702
My goal is to export an h2o model trained on spark with scala (using sparkling-water), such that I can import it in an application without Spark.
Thus:
hex-genmodel
package)I'm therefore using the ModelSerializationSupport
to export, and the MojoModel.load
to import
val gbmParams = new GBMParameters()
gbmParams._train = train
gbmParams._response_column = "target"
gbmParams._ntrees = 5
gbmParams._valid = valid
gbmParams._nfolds = 3
gbmParams._min_rows = 1
gbmParams._distribution = DistributionFamily.multinomial
val gbm = new GBM(gbmParams)
val gbmModel = gbm.trainModel.get
val mojoPath = "./model.zip"
ModelSerializationSupport.exportMOJOModel(gbmModel, new File(mojoPath).toURI, force = true)
val simpleModel = new EasyPredictModelWrapper(MojoModel.load(mojoPath))
Fails with
error in opening zip file
java.util.zip.ZipException: error in opening zip file
at java.util.zip.ZipFile.open(Native Method)
at java.util.zip.ZipFile.<init>(ZipFile.java:220)
at java.util.zip.ZipFile.<init>(ZipFile.java:150)
at java.util.zip.ZipFile.<init>(ZipFile.java:121)
at hex.genmodel.ZipfileMojoReaderBackend.<init>(ZipfileMojoReaderBackend.java:13)
at hex.genmodel.MojoModel.load(MojoModel.java:33)
...
It seems that the mojo exporter doesn't use the same format as expected in the hex.genmodel
(a zip apparently)
Running on h2o 2.1.23 (2.1.24 fails when building the cluster, as reported on https://0xdata.atlassian.net/browse/SW-776) and spark 2.1
-- update:
Using the ModelSerializationSupport class to load it's own export fails too with the same exception:
ModelSerializationSupport.loadMOJOModel(new File(mojoPath).toURI)
H2OModel export and load
Loading back as H2OModel (thus with sparkling-water) does work:
val h2oModelPath = "./model_h2o"
ModelSerializationSupport.exportH2OModel(gbmModel, new File(h2oModelPath).toURI, force = true)
val loadedModel: GBMModel = ModelSerializationSupport.loadH2OModel(new File(h2oModelPath).toURI)
H2OMOJOModel export and load
Loading it back with H2OMOJOModel
does work (copied from implementation of H2OGBM
):
val mojoModel = new H2OMOJOModel(ModelSerializationSupport.getMojoData(gbmModel))
mojoModel.write.overwrite.save(mojoPath)
H2OMOJOModel.load(mojoPath)
H2OGBM export with MojoModel import
Attempting to import using regular MojoModel
fails though :
val gbm = new H2OGBM(gbmParams)(h2oContext, myspark.sqlContext)
val gbmModel = gbm.trainModel(gbmParams)
val mojoPath = "./models.zip"
gbmModel.write.overwrite.save(mojoPath)
MojoModel.load(mojoPath)
with the following exception:
./models.zip/model.ini (No such file or directory)
java.io.FileNotFoundException: ./models.zip/model.ini (No such file or directory)
Upvotes: 1
Views: 1035
Reputation: 702
The solution is actually explained in the getMojoModel
(which accepts either a Model[_,_,_]
or Array[Byte]
) on ModelSerializationSupport
The implementation of getMojoModel(Model[_,_,_])
uses a byte array to store getMojoData(Model[_,_,_])
to, and then reads it back from that byte array.
Quick test as follows works:
val config = new EasyPredictModelWrapper.Config()
config.setModel(ModelSerializationSupport.getMojoModel(gbmModel))
config.setConvertUnknownCategoricalLevelsToNa(true)
val easyPredictModelWrapper = new EasyPredictModelWrapper(config)
Thus now we can reproduce it, on our own, but without using the ModelSerializationSupport
class (as it is part of sparkling water).
First store the mojo data to a file:
val path = java.nio.file.Files.createTempFile("model", ".mojo")
path.toFile.deleteOnExit()
path.toString
import java.io.FileOutputStream
val outputStream = new FileOutputStream(path.toFile)
try {
gbmModel.getMojo.writeTo(outputStream
}
finally if (outputStream != null) outputStream.close()
And then read the bytes (in another scala application):
val is = new FileInputStream(path.toFile)
val reader = MojoReaderBackendFactory.createReaderBackend(is, MojoReaderBackendFactory.CachingStrategy.MEMORY)
val mojoModel = ModelMojoReader.readFrom(reader)
val config = new EasyPredictModelWrapper.Config()
config.setModel(mojoModel)
config.setConvertUnknownCategoricalLevelsToNa(true)
val easyPredictModelWrapper = new EasyPredictModelWrapper(config)
Upvotes: 0