Apache Spark and non-serializable application context

Question

I'm new in Spark.

I want to parallelize my computations using Spark and map-reduce approach. But this computations, which I put into PairFunction implementation for the Map stage, requres some context to be initialized. This context includes several singleton objects from the 3rd party jar, and this objects are not serializable, so I can not spread them across worker nodes and can not use them in my PairFunction.

So my question is: can I somehow parallelize job which requires non-serializable context using Apache Spark? Are there any other solutions? Maybe I can somehow tell Spark to initialize required context on every worker node?

Apache Spark and non-serializable application context

Answers (1)

Related Questions