Extending DefaultParamsReadable and DefaultParamsWritable not allowing reading of custom model

Question

Good day,

I have been struggling for a few days to save a custom transformer that is part of a large pipeline of stages. I have a transformer that is completely defined by its params. I have an estimator which in it's fit method will generate a matrix and then set the transformer parameters accordingly so that I can use DefaultParamsReadable and DefaultParamsReadable to take advantage of the serialisation/deserialisation already present in util.ReadWrite.scala.

My summarised code is as follows (includes important aspects):

...
import org.apache.spark.ml.util._
...

// trait to implement in Estimator and Transformer for params
trait NBParams extends Params {
  
  final val featuresCol= new Param[String](this, "featuresCol", "The input column")
  setDefault(featuresCol, "_tfIdfOut")

  final val labelCol = new Param[String](this, "labelCol", "The labels column")
  setDefault(labelCol, "P_Root_Code_Index")
  
  final val predictionsCol = new Param[String](this, "predictionsCol", "The output column")
  setDefault(predictionsCol, "NBOutput")
  
  final val ratioMatrix = new Param[DenseMatrix](this, "ratioMatrix", "The transformation matrix")
  
  def getfeaturesCol: String = $(featuresCol)  
  def getlabelCol: String = $(labelCol)
  def getPredictionCol: String = $(predictionsCol)  
  def getRatioMatrix: DenseMatrix = $(ratioMatrix) 
  
}


// Estimator
class CustomNaiveBayes(override val uid: String, val alpha: Double) 
  extends Estimator[CustomNaiveBayesModel] with NBParams with DefaultParamsWritable {

      def copy(extra: ParamMap): CustomNaiveBayes = {
        defaultCopy(extra)
      }

      def setFeaturesCol(value: String): this.type = set(featuresCol, value) 

      def setLabelCol(value: String): this.type = set(labelCol, value) 

      def setPredictionCol(value: String): this.type = set(predictionsCol, value) 
    
      def setRatioMatrix(value: DenseMatrix): this.type = set(ratioMatrix, value) 
    
      override def transformSchema(schema: StructType): StructType = {...}
    
      override def fit(ds: Dataset[_]): CustomNaiveBayesModel = {
        ...
        val model = new CustomNaiveBayesModel(uid)
        model
          .setRatioMatrix(ratioMatrix)
          .setFeaturesCol($(featuresCol))
          .setLabelCol($(labelCol))
          .setPredictionCol($(predictionsCol))
    }
}

// companion object for Estimator
object CustomNaiveBayes extends DefaultParamsReadable[CustomNaiveBayes]{
  override def load(path: String): CustomNaiveBayes = super.load(path)
}

// Transformer
class CustomNaiveBayesModel(override val uid: String) 
  extends Model[CustomNaiveBayesModel] with NBParams with DefaultParamsWritable {  
    
  def this() = this(Identifiable.randomUID("customnaivebayes"))
   
  def copy(extra: ParamMap): CustomNaiveBayesModel = {defaultCopy(extra)}
    
  def setFeaturesCol(value: String): this.type = set(featuresCol, value) 
    
  def setLabelCol(value: String): this.type = set(labelCol, value) 
    
  def setPredictionCol(value: String): this.type = set(predictionsCol, value) 
    
  def setRatioMatrix(value: DenseMatrix): this.type = set(ratioMatrix, value) 

  override def transformSchema(schema: StructType): StructType = {...}
  }

  def transform(dataset: Dataset[_]): DataFrame = {...}
}


// companion object for Transformer
object CustomNaiveBayesModel extends DefaultParamsReadable[CustomNaiveBayesModel]

When I add this Model as part of a pipeline and fit the pipeline, all runs ok. When I save the pipeline, there are no errors. However, when I attempt to load the pipeline in I get the following error:

NoSuchMethodException: $line3b380bcad77e4e84ae25a6bfb1f3ec0d45.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$$$6fa979eb27fa6bf89c6b6d1b271932c$$$$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$CustomNaiveBayesModel.read()

To save the pipeline, which includes a number of other transformers related to NLP pre-processing, I run

fittedModelRootCode.write.save("path")

and to then load it (where the failure occurs) I run

import org.apache.spark.ml.PipelineModel
val fittedModelRootCode = PipelineModel.load("path")

The model itself appears to be working well but I cannot afford to retrain the model on a dataset every time I wish to use it. Does anyone have any ideas why even with the companion object, the read() method appears to be unavailable?

Notes:

I am running on Databricks Runtime 8.3 (Spark 3.1.1, Scala 2.12)
My model is in a separate package so is external to Spark
I have reproduced this based on a number of existing examples all of which appear to work fine so I am unsure why my code is failing
I am aware there is a Naive Bayes model available in Spark ML, however, I have been tasked with making a large number of customizations so it is not worth modifying the existing version (plus I would like to learn how to get this right)

Any help would be greatly appreciated.

Extending DefaultParamsReadable and DefaultParamsWritable not allowing reading of custom model

Answers (1)

Related Questions