Alok
Alok

Reputation: 1506

Spark Unit Testing: How to initialize sc only once for all the Suites using FunSuite

I want to write spark unit test cases and I am using FunSuite for it. But i want that my sparkContext is initialized only once , used by all the Suites and then is killed when all Suites completes.

abstract class baseClass extends FunSuite with BeforeAndAfter{
  before {
    println("initialize spark context")
  }
  after {
    println("kill spark context")
  }

}



@RunWith(classOf[JUnitRunner])
class A extends baseClass{
test("for class A"){
//assert
}

@RunWith(classOf[JUnitRunner])
class B extends baseClass{
test(for class b){
//assert
}
}

but when i run sbt test I can see println statement baseClass has been called from both the tests. Obsiously When the object is created for both the classes A and B , Abstract baseclass is called. But then how can we achieve my purpose i.e spark context is iniliazed only once while all the test cases are run

Upvotes: 3

Views: 3966

Answers (3)

mahmoud mehdi
mahmoud mehdi

Reputation: 1590

I strongly recommend using the spark-testing-base library in order to manage the lifecycle of a sparkContext or sparkSession during your tests. You won't have to pollute your tests by overriding the beforeAll, afterAll methods and managing the lifecycle of the sparkSession/sparkContext.

You can share one sparkSession/sparkContext for all the tests by overriding the following method : def reuseContextIfPossible: Boolean = true

for more details : https://github.com/holdenk/spark-testing-base/wiki/SharedSparkContext

I hope it helps!

Upvotes: 2

Tzach Zohar
Tzach Zohar

Reputation: 37852

If you really want to share the context between suites - you'll have to make it static. Then you can use a lazy value to make it start on first use. As for shutting it down - you can leave it to the automatic Shutdown hook created each time a context is created.

It would look something like:

abstract class SparkSuiteBase extends FunSuite {
    lazy val sparkContext = SparkSuiteBase.sparkContext
}

// putting the Spark Context inside an object allows reusing it between tests
object SparkSuiteBase {
    private lazy val sparkContext = ??? // create the context here
}

Upvotes: 0

Tzach Zohar
Tzach Zohar

Reputation: 37852

Option 1: Use the excellent https://github.com/holdenk/spark-testing-base library that does exactly that (and provides many other nice treats). After following the readme, it's as simle as mixing-in SharedSparkContext instead of your baseClass, and you'll have an sc: SparkContext value ready to use in your test

Option 2: to do it yourself, you'd want to mix-in BeforeAndAfterAll and not BeforeAndAfter, and implement beforeAll and afterAll, which is exactly what the above-mentioned SharedSparkContext does.

Upvotes: 1

Related Questions