user9787263
user9787263

Reputation:

Unable to set spark driver memory

I am building a spark(Running on Apache Spark version 2.4.3) session from Jupiter notebook as follows

spark_session  = SparkSession.builder
                      .master("yarn-client")
                      .enableHiveSupport()
                      .getOrCreate()

spark_session.conf.set("spark.executor.memory", '8g')
spark_session.conf.set('spark.executor.cores', '3')
spark_session.conf.set('spark.cores.max', '3')
spark_session.conf.set("spark.driver.memory",'8g')
sc = spark_session.sparkContext

I can see from the application master that all the parameters are being set properly expect the spark.driver.memory. spark.driver.memory no matter what I set its using only 1GB for it.

I have checked spark-default.conf but I there are no parameters such as for spark.driver.memory. To check if its with the session builder/ Jupiter I ran an application using spark-submit from the command-line and to my surprise its picking the driver memory what I am passing.

Can someone please shed some light on this? What could be the reason why its not picking just the spark.driver.memory from the jupyter

Upvotes: 1

Views: 1626

Answers (1)

gruby
gruby

Reputation: 990

Jupyter notebook will launch the pyspark with yarn-client mode, the driver memory and some configs cannot be set with property 'conf' as the JVM driver has already started. you must set it in the command line.

So, to your question - When you run spark in client mode setting a property via "conf.set" will not work as the JVM driver has already started at that point with default config. That's why when you pass the property from the command line it is picking them.

a simple way to start pyspark is

pyspark --driver-memory 2g --executor-memory 2g

Update:

To start jupyter with custom pyspark arguments, create a custom kernel, more on getting started with jupyter kernel: http://cleverowl.uk/2016/10/15/installing-jupyter-with-the-pyspark-and-r-kernels-for-spark-development/

and when you are defining "kernel.json" add --driver-memory 2g --executor-memory 2g to PYSPARK_SUBMIT_ARGS option.

Upvotes: 3

Related Questions