Reputation: 139
I am getting "Task not serializable" error due to assert statement in foreach method on an RDD. Is there any work around to write assert for every element of RDD ?
class myTest extends Funsuite {
//some code to create spark context(sc)
var arrRDD = sc.parallelize(Array(1,1,1,1,1))
test("custom test"){
arrRDD.foreach{
x => {
//commenting out this assert removes the error
assert(x == 1)
}
}
}
}
Upvotes: 1
Views: 592
Reputation: 190
If you want to test this small usecase, you can collect it into an actual scala collection and then use the assert on that. Just like Shankar said.
If you want a broader scope of unit tests you can use a unit testing framework.
Upvotes: 0
Reputation: 23119
RDD (Resilient Distributed Dataset) is a collection which is distributed over a nodes in a cluster, When we work, we just see as a collection in single machine which is due to abstraction.
When you run RDD.map
or any other transformation like map
, filter
etc this is serialized and moved to other nodes on cluster and executes on these nodes.
The error in your"Task not serializable"
is due to the transformation arrRDD.foreach
which is serialized but the method inside it "assert"
is not serialized, So it cannot be moved to the other nodes.
If you care trying to assert
the values you can just collect
it, which brings the data to driver node as an array
an assert
it as
arrRDD.collect().foreach{
x => assert (x == 1)
}
But I don't think it is still a good way!
Hope this helped you :)
Upvotes: 1