Reputation: 5695
I have a spark RDD defined like this:
val dataset = CreateRDD(data.filter(someFilter))
I observed the following:
//if filter is defined as function, such as following,
//then spark will throw spark `task not serialisable exception`
val someFilter = (some) => true
//if filter is defined as method, such as following then everything will be fine
def someFilter(some) => true
why ?
yes, function/method are all defined as members in the test spec
Upvotes: 5
Views: 565
Reputation: 436
The problem is that this:
val isNegative = (num: Int) => num < 0
is merely syntactic sugar for this:
val isNegative = new Function1[Int, Boolean] {
def apply(num: Int): Boolean = num < 0
}
Function1
is a Trait and the anonymous function created is not serializable. When you have something like this:
object Tests {
def isNegative(num: Int): Boolean = num < 0
}
Now isNegative
is a member of Tests
which is serializable. When you call this:
val dataset = CreateRDD(data.filter(isNegative))
Spark needs to serialize isNegative
before shipping it out to each node. Since objects are serializable if all it's members are serializable, when you use def
it works fine, however when you use val
Spark will instead try and serialize the value of isNegative
, which is a non-serializable anonymous function and fail.
Upvotes: 3