Justin Haindel
Justin Haindel

Reputation: 23

Scala: Task not serializable when using closure

So I am fairly new to Spark and Scala and from my understanding you should be able to pass a closure into a map function and have it modify the values however I am getting the Task not serializable error when attempting this.

My code is as follows:

// Spark Context
val sparkContext = spark.sparkContext
val random = scala.util.Random

// RDD Initialization
val array = Seq.fill(500)(random.nextInt(51))
val RDD = sc.parallelize(array)

// Spark Operations for Count, Sum, and Mean
var count = RDD.count()
var sum = RDD.reduce(_+_)
val mean = sum / count;

//Output Count, Sum, and Mean
println("Count: " + count)
println("Sum: " + sum)
println("Mean: " + mean)

val difference = (x:Int) => {x - mean}

var differences = RDD.map(difference)

Any help would be greatly appreciated

Upvotes: 0

Views: 76

Answers (1)

Amit
Amit

Reputation: 1121

Instead of defining a function, try using a val fun

val difference = (x:Int) => {x-mean}

When you use def to define a function, Spark would try to serialize your object that has this function. This usually results into TaskNotSerializable as there might be something (val or var) in that Object which is not serializable.

Upvotes: 1

Related Questions