Reputation: 7440
I'm trying to increase memory allocation for my executors and drivers in Spark, but I have the strange feeling that Spark is ignoring my configurations.
I'm using the following commands:
spark-submit spark_consumer.py --driver-memory=10G --executor-memory=5G --conf spark.executor.extraJavaOptions='-XX:+UseParallelGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps'
My initialization code is
class SparkRawConsumer:
def __init__(self, filename):
self.sparkContext = SparkContext.getOrCreate()
self.sparkContext.setLogLevel("ERROR")
self.sqlContext = SQLContext(self.sparkContext)
Theoretically, I should see that my driver program has total available 10GB of memory. However, I see this in my Spark UI (where my memory available is less than 400MB):
Why is Spark ignoring the configurations I am passing in?
Upvotes: 0
Views: 1594
Reputation: 7440
The issue here was that I had specified the ordering of parameters incorrectly. Typing spark-submit --help
clearly specifies an ordering for the input parameters to spark-submit
:
Usage: spark-submit [options] <app jar | python file | R file> [app arguments]
Once I changed the ordering of the parameters, I was able to increase memory on my PySpark app:
spark-submit --driver-memory 8G --executor-memory 8G spark_consumer.py
Upvotes: 0
Reputation: 1771
There are 3 differents way to define spark configuration
1) spark-env.sh
2) spark-submit parameter
3) hard coding sparkConf, exemple : sparkConf.set("spark.driver.memory","10G");
the priority are : hard coding > spark-submit > spark.env ;
if you think your parameter are overwrite by something else you can check it with : sparkConf.getOption("spark.driver.memory");
if you want to be sure that your options are not overwrite hard code it.
you can see all options here : https://spark.apache.org/docs/latest/configuration.html
Upvotes: 1