Reputation: 6796
SparkConf
on pyspark does not read the configuration arguments passed to spark-submit
.
My python code is something like
from pyspark import SparkContext, SparkConf
conf = SparkConf().setAppName("foo")
sc = SparkContext(conf=conf)
# processing code...
sc.stop()
and I submit it with
PYSPARK_PYTHON="/opt/anaconda/bin/python" spark-submit foo.py \
--master local[4] --conf="spark.driver.memory=16g" --executor-memory 16g
but none of the configuration arguments are applied. That is, the application is executed with the default values of local[*] for master, 1g for driver memory and 1g for executor memory. This was confirmed by the Spark GUI.
However, the configuration arguments are followed if I use pyspark to submit the application:
PYSPARK_PYTHON="/opt/anaconda/bin/python" pyspark --master local[4] \
--conf="spark.driver.memory=8g"
Notice that --executor-memory 16g
was also changed to --conf="spark.executor.memory=16g"
because the former doesn't work either.
What am I doing wrong?
Upvotes: 4
Views: 2968
Reputation: 6796
Apparently, the order of the arguments matter. The last argument should be the name of the python script. So, the call should be
PYSPARK_PYTHON="/opt/anaconda/bin/python" spark-submit \
--master local[4] --conf="spark.driver.memory=16g" --executor-memory 16g foo.py
or, following @glennie-helles-sindholt's advise,
PYSPARK_PYTHON="/opt/anaconda/bin/python" spark-submit \
--master local[4] --driver-memory 16g --executor-memory 16g foo.py
Upvotes: 2
Reputation: 13154
I believe you need to remove the =
sign from --conf=
. Your spark-submit
script should be
PYSPARK_PYTHON="/opt/anaconda/bin/python" spark-submit foo.py \
--master local[4] --conf spark.driver.memory=16g --executor-memory 16g
Note that spark-submit also supports setting driver memory with the flag --driver-memory 16G
Upvotes: 3