SparkConf not reading spark-submit arguments

Question

SparkConf on pyspark does not read the configuration arguments passed to spark-submit.

My python code is something like

from pyspark import SparkContext, SparkConf

conf = SparkConf().setAppName("foo")
sc = SparkContext(conf=conf)

# processing code...

sc.stop()

and I submit it with

PYSPARK_PYTHON="/opt/anaconda/bin/python" spark-submit foo.py \
--master local[4] --conf="spark.driver.memory=16g" --executor-memory 16g

but none of the configuration arguments are applied. That is, the application is executed with the default values of local[*] for master, 1g for driver memory and 1g for executor memory. This was confirmed by the Spark GUI.

However, the configuration arguments are followed if I use pyspark to submit the application:

PYSPARK_PYTHON="/opt/anaconda/bin/python" pyspark --master local[4] \
--conf="spark.driver.memory=8g"

Notice that --executor-memory 16g was also changed to --conf="spark.executor.memory=16g" because the former doesn't work either.

What am I doing wrong?

Christian Alis · Accepted Answer

Apparently, the order of the arguments matter. The last argument should be the name of the python script. So, the call should be

PYSPARK_PYTHON="/opt/anaconda/bin/python" spark-submit \
    --master local[4] --conf="spark.driver.memory=16g" --executor-memory 16g foo.py

or, following @glennie-helles-sindholt's advise,

PYSPARK_PYTHON="/opt/anaconda/bin/python" spark-submit \
    --master local[4] --driver-memory 16g --executor-memory 16g foo.py

SparkConf not reading spark-submit arguments

Answers (2)

Related Questions