Reputation: 1578
When I'm trying to use UDF that returns the Vector object, Spark throws the following exception:
Cause: java.lang.UnsupportedOperationException: Not supported DataType: org.apache.spark.mllib.linalg.VectorUDT@f71b0bce
How can I use Vector in my UDFs? The Spark version is 1.5.1.
UPD
val dataFrame: DataFrame = sqlContext.createDataFrame(Seq(
(0, 1, 2),
(0, 3, 4),
(0, 5, 6)
)).toDF("key", "a", "b")
val someUdf = udf {
(a: Double, b: Double) => Vectors.dense(a, b)
}
dataFrame.groupBy(col("key"))
.agg(someUdf(avg("a"), avg("b")))
Upvotes: 1
Views: 1361
Reputation: 330163
There is nothing wrong with your UDF per se. It looks like you get an exception because you call it inside agg
method on aggregate columns. To make it work you can simply push it outside agg
step:
dataFrame
.groupBy($"key")
.agg(avg($"a").alias("a"), avg($"b").alias("b"))
.select($"key", someUdf($"a", $"b"))
Upvotes: 1