Reputation: 1089
I had trained Machine Learning models using Spark RDD based API (mllib package) 1.5.2 say "Mymodel123",
org.apache.spark.mllib.tree.model.RandomForestModel Mymodel123 = ....;
Mymodel123.save("sparkcontext","path");
Now I'm using Spark Dataset based API (ml package) 2.2.0 . Is there any way to load the models (Mymodel123) using Dataset based API?
org.apache.spark.ml.classification.RandomForestClassificationModel newModel = org.apache.spark.ml.classification.RandomForestClassificationModel.load("sparkcontext","path");
Upvotes: 0
Views: 473
Reputation: 35249
There is no public API that can do that, however you RandomForestModels
wrap old mllib
API and provide private
methods which can be used to convert mllib
models to ml
models:
/** Convert a model from the old API */
private[ml] def fromOld(
oldModel: OldRandomForestModel,
parent: RandomForestClassifier,
categoricalFeatures: Map[Int, Int],
numClasses: Int,
numFeatures: Int = -1): RandomForestClassificationModel = {
require(oldModel.algo == OldAlgo.Classification, "Cannot convert RandomForestModel" +
s" with algo=${oldModel.algo} (old API) to RandomForestClassificationModel (new API).")
val newTrees = oldModel.trees.map { tree =>
// parent for each tree is null since there is no good way to set this.
DecisionTreeClassificationModel.fromOld(tree, null, categoricalFeatures)
}
val uid = if (parent != null) parent.uid else Identifiable.randomUID("rfc")
new RandomForestClassificationModel(uid, newTrees, numFeatures, numClasses)
}
so it is not impossible. In Java you can use it directly (Java doesn't respect package private modifiers), in Scala you'll have to put adapter code in org.apache.spark.ml
package.
Upvotes: 1