Reputation: 1673
Suppose I have the following class in Spark Scala:
class SparkComputation(i: Int, j: Int) {
def something(x: Int, y: Int) = (x + y) * i
def processRDD(data: RDD[Int]) = {
val j = this.j
val something = this.something _
data.map(something(_, j))
}
}
I get the Task not serializable Exception
when I run the following code:
val s = new SparkComputation(2, 5)
val data = sc.parallelize(0 to 100)
val res = s.processRDD(data).collect
I'm assuming that the exception occurs because Spark is trying to serialize the SparkComputation instance. To prevent this from happening, I have stored the class members I'm using in the RDD operation in local variables (j
and something
). However, Spark still tries to serialize SparkComputation
object because of the method. Is there anyway to pass the class method to map
without forcing Spark to serializing the whole SparkComputation
class? I know the following code works without any problem:
def processRDD(data: RDD[Int]) = {
val j = this.j
val i = this.i
data.map(x => (x + j) * i)
}
So, the class members that store values are not causing the problem. The problem is with the function. I have also tried the following approach with no luck:
class SparkComputation(i: Int, j: Int) {
def processRDD(data: RDD[Int]) = {
val j = this.j
val i = this.i
def something(x: Int, y: Int) = (x + y) * i
data.map(something(_, j))
}
}
Upvotes: 0
Views: 340
Reputation:
Make the class serializable:
class SparkComputation(i: Int, j: Int) extends Serializable {
def something(x: Int, y: Int) = (x + y) * i
def processRDD(data: RDD[Int]) = {
val j = this.j
val something = this.something _
data.map(something(_, j))
}
}
Upvotes: 1