Piyush Shrivastava
Piyush Shrivastava

Reputation: 1098

Can't initialize Spark Context while using sbt test

I have written unit test cases in Spark using Scala in Specs2 framework. In some of the tests, I am creating a Spark Context and passing in functions.

         val conf = new SparkConf().setAppName("test").setMaster("local[2]")
         val sc = new SparkContext(conf)
         val rdd = sc.parallelize(arr)
         val output = Util.getHistograms(rdd, header, skipCols, nBins)

These tests are executing correctly in eclipse JUnit plug-in with no errors or failures, but when I run sbt test, I get a strange exception and the test returns with errors.

[info] Case 8: getHistograms should
[error]   ! return with correct output
[error]    akka.actor.InvalidActorNameException: actor name [ExecutorEndpoint] is not unique! (ChildrenContainer.scala:192)
[error] akka.actor.dungeon.ChildrenContainer$TerminatingChildrenContainer.reserve(ChildrenContainer.scala:192)
[error] akka.actor.dungeon.Children$class.reserveChild(Children.scala:77)
[error] akka.actor.ActorCell.reserveChild(ActorCell.scala:369)
[error] akka.actor.dungeon.Children$class.makeChild(Children.scala:202)
[error] akka.actor.dungeon.Children$class.attachChild(Children.scala:42)
[error] akka.actor.ActorCell.attachChild(ActorCell.scala:369)
[error] akka.actor.ActorSystemImpl.actorOf(ActorSystem.scala:552)
[error] org.apache.spark.rpc.akka.AkkaRpcEnv.actorRef$lzycompute$1(AkkaRpcEnv.scala:92)
[error] org.apache.spark.rpc.akka.AkkaRpcEnv.org$apache$spark$rpc$akka$AkkaRpcEnv$$actorRef$1(AkkaRpcEnv.scala:92)
[error] org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$setupEndpoint$1.apply(AkkaRpcEnv.scala:148)
[error] org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$setupEndpoint$1.apply(AkkaRpcEnv.scala:148)
[error] org.apache.spark.rpc.akka.AkkaRpcEndpointRef.actorRef$lzycompute(AkkaRpcEnv.scala:281)
[error] org.apache.spark.rpc.akka.AkkaRpcEndpointRef.actorRef(AkkaRpcEnv.scala:281)
[error] org.apache.spark.rpc.akka.AkkaRpcEndpointRef.hashCode(AkkaRpcEnv.scala:329)
[error] org.apache.spark.rpc.akka.AkkaRpcEnv.registerEndpoint(AkkaRpcEnv.scala:73)
[error] org.apache.spark.rpc.akka.AkkaRpcEnv.setupEndpoint(AkkaRpcEnv.scala:149)
[error] org.apache.spark.executor.Executor.<init>(Executor.scala:89)
[error] org.apache.spark.scheduler.local.LocalEndpoint.<init>(LocalBackend.scala:57)
[error] org.apache.spark.scheduler.local.LocalBackend.start(LocalBackend.scala:119)
[error] org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
[error] org.apache.spark.SparkContext.<init>(SparkContext.scala:514)
[error] UtilTest$$anonfun$8$$anonfun$apply$29.apply(UtilTest.scala:113)
[error] UtilTest$$anonfun$8$$anonfun$apply$29.apply(UtilTest.scala:111)

I guess because of the the SparkContext (sc) is not getting created and I am getting a null, but I can't understand what is causing this. Thanks in advance.

Upvotes: 1

Views: 1642

Answers (2)

Erik Schmiegelow
Erik Schmiegelow

Reputation: 2759

In fact the reason is even simpler - you cannot run mutable spark contexts in the same JVM at the same time. sbt test executes tests in parallel, meaning that if your tests all spawn a spark context, the tests will fail.

To prevent this from happening add the following to your build.sbt:

// super important with multiple tests running spark Contexts
parallelExecution in Test := false

which will result in sequential tests execution.

Upvotes: 1

Piyush Shrivastava
Piyush Shrivastava

Reputation: 1098

This was happening because sbt executes all the tests together, and thus multiple SparkContext were getting created due to the running of Specifications file multiple times. To resolve this, add a separate object and initialize your SparkContext in it. Use this sc all over the test code so that it doesn't get created multiple times.

Upvotes: 1

Related Questions