zinking
zinking

Reputation: 5695

why is scala method serialisable while function not?

I have a spark RDD defined like this:

val dataset = CreateRDD(data.filter(someFilter))

I observed the following:

//if filter is defined as function, such as following, 
//then spark will throw spark `task not serialisable exception`
val someFilter = (some) => true
//if filter is defined as method, such as following then everything will be fine
def someFilter(some) => true

why ?

yes, function/method are all defined as members in the test spec

Upvotes: 5

Views: 565

Answers (1)

kapunga
kapunga

Reputation: 436

The problem is that this:

val isNegative = (num: Int) => num < 0

is merely syntactic sugar for this:

val isNegative = new Function1[Int, Boolean] {
  def apply(num: Int): Boolean = num < 0
}

Function1 is a Trait and the anonymous function created is not serializable. When you have something like this:

object Tests {
  def isNegative(num: Int): Boolean = num < 0
}

Now isNegative is a member of Tests which is serializable. When you call this:

val dataset = CreateRDD(data.filter(isNegative))

Spark needs to serialize isNegative before shipping it out to each node. Since objects are serializable if all it's members are serializable, when you use def it works fine, however when you use val Spark will instead try and serialize the value of isNegative, which is a non-serializable anonymous function and fail.

Upvotes: 3

Related Questions