Reputation: 21
I just began to learn Spark. I tried to define a UDF for my data frame. It is a very simple function like this: column is Long type in dataframe.
def category: (Int => Int) ={ a=>
if(a<7){
return a
}
else{
if(a>=7 && a<14){
return 8
}
else{
if(a>=14 && a<28){
return 9
}
else{
return 10
}
}
}
}
import org.apache.spark.sql.functions.udf
val myudf = udf(category)
val df_1 = df.withColumn("ncol", myudf($col))
It always reports errors:
:78: error: type mismatch; found : Int required: Int => Int return a ^ :82: error: type mismatch; found : Int(8) required: Int => Int return 8 ^ :86: error: type mismatch; found : Int(9) required: Int => Int return 9 ^ :89: error: type mismatch; found : Int(10) required: Int => Int return 10 ^
Upvotes: 0
Views: 2796
Reputation: 22449
Given that the column type is Long
, your category
method should take Long
parameter rather than Int
. Here's how I would define the function:
def category: (Long => Long) = { a =>
if (a < 7) a else
if (a < 14) 8 else
if (a < 28) 9 else
10
}
val myudf = udf(category)
Or, you could just create the UDF
in one code block:
val myudf = udf(
(a: Long) =>
if (a < 7) a else
if (a >= 7 && a < 14) 8 else
if (a >= 14 && a < 28) 9 else
10
)
Upvotes: 1
Reputation: 8529
Get rid of your return statements. return
is unnecessary in most scala code, as the last expression is returned automatically (this works for if statements as well, the whole if is a statement that returns a value). It's not recommended to use return
when you don't have to as it can cause some unexpected behaviour like you are experiencing here.
lets simplify your code
def category: (Int => Int) ={ a=>
return a
}
This will fail to compile. category
is a method that takes no arguments and returns a function Int => Int
. So far so good. But now return a
causes the method to return the value of a
, which is an Int
. This breaks because we wanted and Int => Int
and we return an Int
. To fix this, just remove the return.
def category: (Int => Int) = { a=>
a
}
Now the code works, because the last (and only) expression in our method is the function a => a
which is an Int => Int
just like we wanted. The function here is kinda boring, it just returns the input without changes.
Lets try it with your function
def category: (Int => Int) ={ a=>
if(a<7){
a
}
else{
if(a>=7 && a<14){
8
}
else{
if(a>=14 && a<28){
9
}
else{
10
}
}
}
}
Now it works because instead of trying to return an Int
from within the function, we return the entire function which is what we want.
Upvotes: 2