Reputation: 21
I'm currently testing the Mleap solution in order to perform prediction on Spark model. In order to do that, I've first implemented the Spark example for linear regression as described here: https://spark.apache.org/docs/2.3.0/ml-classification-regression.html#linear-regression I've been able to save the model in a Mleap bundle and reuse in another Spark context. Now, I'd like to use this bundle in a Mleap runtime but I'm facing some casting issues that keeps it from working correctly
The error comes from the schema definition:
val dataSchema = StructType(Seq(
StructField("label", ScalarType.Double),
StructField("features", ListType.Double)
)).get
The "features" part is a set of columns that are grouped. I've tried many things, but no luck:
StructField("label", ScalarType.Double),
StructField("features", ListType.Double)
)).get
=> this gives me
java.lang.IllegalArgumentException: Cannot cast ListType(double,true) to TensorType(double,Some(WrappedArray(10)),true)
So I tried:
val dataSchema = StructType(Seq(
StructField("label", ScalarType.Double),
StructField("features", TensorType.Double(10))
)).get
but it gave me
java.lang.ClassCastException: scala.collection.immutable.$colon$colon cannot be cast to ml.combust.mleap.tensor.Tensor
Here is the whole piece of code:
val dataSchema = StructType(Seq(
StructField("label", ScalarType.Double),
StructField("features", TensorType.Double(10))
)).get
val data = Seq(Row(-9.490009878824548, Seq(0.4551273600657362, 0.36644694351969087, -0.38256108933468047, -0.4458430198517267, 0.33109790358914726,0.8067445293443565, -0.2624341731773887,-0.44850386111659524,-0.07269284838169332, 0.5658035575800715)))
val bundle = (for(bundleFile <- managed(BundleFile("jar:file:/tmp/spark-lrModel.zip"))) yield {
bundleFile.loadMleapBundle().get
}).tried.get
var model = bundle.root
val to_test = DefaultLeapFrame(dataSchema, data)
val res = model.transform(to_test).get // => Here is the place which raises the exception
I'm a little bit lost now with this Type mapping. Any idea?
Thanks,
Stéphane
Upvotes: 1
Views: 372
Reputation: 21
Answer to myself: it's not a good idea to start from the Spark examples as data are already in libsvm format and as such features are already gathered in a vector. It looks like that in this situation mapping is not possible. But starting from a basic example with a full pipeline (vectorassembler + ml) it works fine
Upvotes: 1