Rob Jarvis
Rob Jarvis

Reputation: 356

Unable to start/stop spark in Before and After methods in Unit Test

I'm writing unit tests for my spark application but am running into an issue. Ideally, I'd like to have a Before and After method where I start and stop spark (rather than doing it in the unit test method itself).

The below implementation works, I'm able to declare the SparkConf and the JavaSparkContext at the beginning of the test and then execute sc.close() at the end.

@Test
public void testMethod(){
            SparkConf conf = new SparkConf().setMaster("local").setAppName("testGeoHashAggregation");
            JavaSparkContext sc = new JavaSparkContext(conf);

            // yadda yadda yadda   
            //tests here

            sc.close()
}

But I would like to set it up so that the start and stop have their own methods

@Before
public void init(){
        SparkConf conf = new SparkConf().setMaster("local").setAppName("testGeoHashAggregation");
        JavaSparkContext sc = new JavaSparkContext(conf);
}

@After
public void close(){
    sc.close();
}

Any ideas why the latter doesn't work? I will run into an exception that some object is not serializable.

Upvotes: 3

Views: 1429

Answers (1)

T. Gawęda
T. Gawęda

Reputation: 16076

  1. At first, sc should be a class field, not local variable of init () method
  2. Secondly, isn't it better to use @BeforeClass and @AfterClass? Spark start is quite long (web ui, cluster, etc. is starting each time), if full isolation is not a problem, then @BeforeClass would be better. However it's only an advice, not solution for your question.

So, code will look like in this example:

public class SomeSparkTest implements Serializable {

private static transient JavaSparkContext jsc;

@BeforeClass 
public static void init () {
    SparkConf conf = new SparkConf().setMaster("local").setAppName("testGeoHashAggregation");
    jsc = new JavaSparkContext(conf);
}

@AfterClass
public static void close(){
    jsc.close();
}

@Test 
public void shouldSomethingHappend () {
    // given 
    // preparation of RDDs
    JavaRDD<String> rdd = jsc.textFile(...)

    // when
    // actions
    long count = rdd.count()

    // then
    // verify output; assertThat is from FestAssertions, not necessary
    assertThat(count).isEqualTo(5)
}
}

Here you've got some test suite from Spark repository

Explanation: Any lambda or inner class, that you're using in test, has reference to outer class, in this case it is the test class. Spark always serializes lambdas/inner classes, even in local cluster mode. If your test class is not serializable or if it has any non-serializable field, then such error will appear

Upvotes: 1

Related Questions