Reputation: 356
I'm writing unit tests for my spark application but am running into an issue. Ideally, I'd like to have a Before
and After
method where I start and stop spark (rather than doing it in the unit test method itself).
The below implementation works, I'm able to declare the SparkConf
and the JavaSparkContext
at the beginning of the test and then execute sc.close()
at the end.
@Test
public void testMethod(){
SparkConf conf = new SparkConf().setMaster("local").setAppName("testGeoHashAggregation");
JavaSparkContext sc = new JavaSparkContext(conf);
// yadda yadda yadda
//tests here
sc.close()
}
But I would like to set it up so that the start and stop have their own methods
@Before
public void init(){
SparkConf conf = new SparkConf().setMaster("local").setAppName("testGeoHashAggregation");
JavaSparkContext sc = new JavaSparkContext(conf);
}
@After
public void close(){
sc.close();
}
Any ideas why the latter doesn't work? I will run into an exception that some object is not serializable.
Upvotes: 3
Views: 1429
Reputation: 16076
sc
should be a class field, not local variable of init ()
method@BeforeClass
and @AfterClass
? Spark start is quite long (web ui, cluster, etc. is starting each time), if full isolation is not a problem, then @BeforeClass
would be better. However it's only an advice, not solution for your question.So, code will look like in this example:
public class SomeSparkTest implements Serializable {
private static transient JavaSparkContext jsc;
@BeforeClass
public static void init () {
SparkConf conf = new SparkConf().setMaster("local").setAppName("testGeoHashAggregation");
jsc = new JavaSparkContext(conf);
}
@AfterClass
public static void close(){
jsc.close();
}
@Test
public void shouldSomethingHappend () {
// given
// preparation of RDDs
JavaRDD<String> rdd = jsc.textFile(...)
// when
// actions
long count = rdd.count()
// then
// verify output; assertThat is from FestAssertions, not necessary
assertThat(count).isEqualTo(5)
}
}
Here you've got some test suite from Spark repository
Explanation: Any lambda or inner class, that you're using in test, has reference to outer class, in this case it is the test class. Spark always serializes lambdas/inner classes, even in local cluster mode. If your test class is not serializable or if it has any non-serializable field, then such error will appear
Upvotes: 1