Stéphane Davy
Stéphane Davy

Reputation: 21

Unable to run transform in Mleap runtime from Spark model

I'm currently testing the Mleap solution in order to perform prediction on Spark model. In order to do that, I've first implemented the Spark example for linear regression as described here: https://spark.apache.org/docs/2.3.0/ml-classification-regression.html#linear-regression I've been able to save the model in a Mleap bundle and reuse in another Spark context. Now, I'd like to use this bundle in a Mleap runtime but I'm facing some casting issues that keeps it from working correctly

The error comes from the schema definition:

       val dataSchema = StructType(Seq(
                          StructField("label", ScalarType.Double),
                          StructField("features", ListType.Double)
                        )).get

The "features" part is a set of columns that are grouped. I've tried many things, but no luck:

                          StructField("label", ScalarType.Double),
                          StructField("features", ListType.Double)
                        )).get

=> this gives me

java.lang.IllegalArgumentException: Cannot cast ListType(double,true) to TensorType(double,Some(WrappedArray(10)),true)

So I tried:

       val dataSchema = StructType(Seq(
                          StructField("label", ScalarType.Double),
                          StructField("features", TensorType.Double(10))
                        )).get

but it gave me

java.lang.ClassCastException: scala.collection.immutable.$colon$colon cannot be cast to ml.combust.mleap.tensor.Tensor

Here is the whole piece of code:

    val dataSchema = StructType(Seq(
                  StructField("label", ScalarType.Double),
                  StructField("features", TensorType.Double(10))
               )).get
    val data = Seq(Row(-9.490009878824548, Seq(0.4551273600657362, 0.36644694351969087, -0.38256108933468047, -0.4458430198517267, 0.33109790358914726,0.8067445293443565, -0.2624341731773887,-0.44850386111659524,-0.07269284838169332, 0.5658035575800715)))

    val bundle = (for(bundleFile <- managed(BundleFile("jar:file:/tmp/spark-lrModel.zip"))) yield {
          bundleFile.loadMleapBundle().get
    }).tried.get

   var model = bundle.root
   val to_test = DefaultLeapFrame(dataSchema, data)
   val res = model.transform(to_test).get // => Here is the place which raises the exception

I'm a little bit lost now with this Type mapping. Any idea?

Thanks,

Stéphane

Upvotes: 1

Views: 372

Answers (1)

St&#233;phane Davy
St&#233;phane Davy

Reputation: 21

Answer to myself: it's not a good idea to start from the Spark examples as data are already in libsvm format and as such features are already gathered in a vector. It looks like that in this situation mapping is not possible. But starting from a basic example with a full pipeline (vectorassembler + ml) it works fine

Upvotes: 1

Related Questions