Reputation: 18601
Imagine the following code:
def myUdf(arg: Int) = udf((vector: MyData) => {
// complex logic that returns a Double
})
How can I define the return type for myUdf so that people looking at the code will know immediately that it returns a Double?
Upvotes: 6
Views: 17028
Reputation: 10153
You can pass a type parameter to udf
but you need to seemingly counter-intuitively pass the return type first, followed by the input types like [ReturnType, ArgTypes...]
, at least as of Spark 2.3.x. Using the original example (which seems to be a curried function based on arg
):
def myUdf(arg: Int) = udf[Double, Seq[Int]]((vector: Seq[Int]) => {
13.37 // whatever
})
Upvotes: 3
Reputation: 18601
Spark functions define several udf
methods that have the following modifier/type: static <RT,A1, ..., A10> UserDefinedFunction
You can specify the input/output data types in square brackets as follows:
def myUdf(arg: Int) = udf[Double, MyData]((vector: MyData) => {
// complex logic that returns a Double
})
Upvotes: 4
Reputation: 13001
There is nothing special about UDF with lambda functions, they behave just like scala lambda function (see Specifying the lambda return type in Scala) so you could do:
def myUdf(arg: Int) = udf(((vector: MyData) => {
// complex logic that returns a Double
}): (MyData => Double))
or instead explicitly define your function:
def myFuncWithArg(arg: Int) {
def myFunc(vector: MyData): Double = {
// complex logic that returns a Double. Use arg here
}
myFunc _
}
def myUdf(arg: Int) = udf(myFuncWithArg(arg))
Upvotes: 2
Reputation: 27373
I see two ways to do it, either define a method first and then lift it to a function
def myMethod(vector:MyData) : Double = {
// complex logic that returns a Double
}
val myUdf = udf(myMethod _)
or define a function first with explicit type:
val myFunction: Function1[MyData,Double] = (vector:MyData) => {
// complex logic that returns a Double
}
val myUdf = udf(myFunction)
I normally use the firt approach for my UDFs
Upvotes: 6