Maroof
Maroof

Reputation: 139

SPARK Task not serializable due to assert statement

I am getting "Task not serializable" error due to assert statement in foreach method on an RDD. Is there any work around to write assert for every element of RDD ?

class myTest extends Funsuite {

  //some code to create spark context(sc)

  var arrRDD = sc.parallelize(Array(1,1,1,1,1))

  test("custom test"){
     arrRDD.foreach{
        x => {
           //commenting out this assert removes the error
           assert(x == 1)
        } 
     }
  }

}

Upvotes: 1

Views: 592

Answers (2)

Steven Laan
Steven Laan

Reputation: 190

If you want to test this small usecase, you can collect it into an actual scala collection and then use the assert on that. Just like Shankar said.

If you want a broader scope of unit tests you can use a unit testing framework.

Upvotes: 0

koiralo
koiralo

Reputation: 23119

RDD (Resilient Distributed Dataset) is a collection which is distributed over a nodes in a cluster, When we work, we just see as a collection in single machine which is due to abstraction.

When you run RDD.map or any other transformation like map, filter etc this is serialized and moved to other nodes on cluster and executes on these nodes.

The error in your"Task not serializable" is due to the transformation arrRDD.foreach which is serialized but the method inside it "assert" is not serialized, So it cannot be moved to the other nodes.

If you care trying to assert the values you can just collect it, which brings the data to driver node as an array an assert it as

arrRDD.collect().foreach{
  x => assert (x == 1)
}

But I don't think it is still a good way!

Hope this helped you :)

Upvotes: 1

Related Questions