Reputation: 71
I want to train the logistic regression model using Apache Spark in Java. As first step I would like to train the model just once and save the model parameters (intercept and Coefficient). Subsequently use the saved model parameters to score at a later point in time. I am able to save the model in parquet file, using the following code
LogisticRegressionModel trainedLRModel = logReg.fit(data);
trainedLRModel.write().overwrite().save("mypath");
When I load the model to score, I get the following error.
LogisticRegression lr = new LogisticRegression();
lr.load("//saved_model_path");
Exception in thread "main" java.lang.NoSuchMethodException: org.apache.spark.ml.classification.LogisticRegressionModel.<init>(java.lang.String)
at java.lang.Class.getConstructor0(Class.java:3082)
at java.lang.Class.getConstructor(Class.java:1825)
at org.apache.spark.ml.util.DefaultParamsReader.load(ReadWrite.scala:325)
at org.apache.spark.ml.util.MLReadable$class.load(ReadWrite.scala:215)
at org.apache.spark.ml.classification.LogisticRegression$.load(LogisticRegression.scala:672)
at org.apache.spark.ml.classification.LogisticRegression.load(LogisticRegression.scala)
Is there a way to train and save model and then evaluate(score) later? I am using Spark ML 2.1.0 in Java.
Upvotes: 4
Views: 4358
Reputation: 1304
I face the same problem with pyspark 2.1.1, when i change from LogisticRegression to LogisticRegressionModel , everything works well.
LogisticRegression.load("/model/path") # not works
LogisticRegressionModel.load("/model/path") # works well
Upvotes: 6
Reputation: 74679
TL;DR Use LogisticRegressionModel.load.
load(path: String): LogisticRegressionModel Reads an ML instance from the input path, a shortcut of read.load(path).
As a matter of fact, as of Spark 2.0.0, the recommended approach to use Spark MLlib, incl. LogisticRegression
estimator, is using the brand new and shiny Pipeline API.
import org.apache.spark.ml.classification._
val lr = new LogisticRegression()
import org.apache.spark.ml.feature._
val tok = new Tokenizer().setInputCol("body")
val hashTF = new HashingTF().setInputCol(tok.getOutputCol).setOutputCol("features")
import org.apache.spark.ml._
val pipeline = new Pipeline().setStages(Array(tok, hashTF, lr))
// training dataset
val emails = Seq(("hello world", 1)).toDF("body", "label")
val model = pipeline.fit(emails)
model.write.overwrite.save("mypath")
val loadedModel = PipelineModel.load("mypath")
Upvotes: 2