Reputation: 571
I have a Mllib decision tree model trained on a set of data. I want to be able to save and load the trained model whenever necessary.e.g. assume I train on million row data set and save it for future use. I found that using FileInputStream,FileOutputStream,ObjectInputStream,ObjectOutputStream I can save and load a Linear model because they made those constructor public as below.
you can save model to disk as following:
import java.io.FileOutputStream
import java.io.ObjectOutputStream
val fos = new FileOutputStream("e:/model.obj")
val oos = new ObjectOutputStream(fos)
oos.writeObject(model)
oos.close
and load it in:
import java.io.FileInputStream
import java.io.ObjectInputStream
val fos = new FileInputStream("e:/model.obj")
val oos = new ObjectInputStream(fos)
val newModel = oos.readObject().asInstanceOf[org.apache.spark.mllib.classification.LogisticRegressionModel]
The above does syntactically works for DecisionTree as well but I cannot call the newModel.predict() since the Decision Tree constructors were not made public apparently.
Does anyone now how I can save and load models like DecisionTree,RandomForest,SVM,etc.?
Upvotes: 0
Views: 1051
Reputation: 21700
You could use the .save
method on the model to store it as parquet file and load it via .load
on the companion object. That saves it as parquet file, this should be faster than using plain java serialization, which is often slow.
See https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.mllib.util.Saveable
Upvotes: 2