Reputation: 23
So I am fairly new to Spark and Scala and from my understanding you should be able to pass a closure into a map function and have it modify the values however I am getting the Task not serializable
error when attempting this.
My code is as follows:
// Spark Context
val sparkContext = spark.sparkContext
val random = scala.util.Random
// RDD Initialization
val array = Seq.fill(500)(random.nextInt(51))
val RDD = sc.parallelize(array)
// Spark Operations for Count, Sum, and Mean
var count = RDD.count()
var sum = RDD.reduce(_+_)
val mean = sum / count;
//Output Count, Sum, and Mean
println("Count: " + count)
println("Sum: " + sum)
println("Mean: " + mean)
val difference = (x:Int) => {x - mean}
var differences = RDD.map(difference)
Any help would be greatly appreciated
Upvotes: 0
Views: 76
Reputation: 1121
Instead of defining a function, try using a val fun
val difference = (x:Int) => {x-mean}
When you use def
to define a function, Spark would try to serialize your object that has this function. This usually results into TaskNotSerializable as there might be something (val or var) in that Object which is not serializable.
Upvotes: 1