reusable sparkcontext instance

Question

I'm quite new to Big Data and currently, I'm working on a CLI project that performs some text parsing using apache spark.

When a command is typed, a new sparkcontext is instantiated and some files are read from a hdfs instance. However, the spark is taking too much time to initialize a sparkcontext or even a sparksession object.

So, my question is:- Is there a way to reuse a sparkcontext instance between these commands to reduce this overhead? I've heard about spark job server, but it's been too hard to deploy a local server since its main guide is a bit confusing.

Thank you.

P.S.: I'm using pyspark

reusable sparkcontext instance

Answers (1)

Related Questions