Reputation: 1023
In my application I'm creating a SparkSession
object and then trying to Read my properties file and setting the properties at runtime. But it is not picking up the properties that I am passing at runtime.
I am submitting my App in YARN Cluster Mode
This is my inital Spark session object which I am creating in a Trait
val spark = SparkSession.builder().appName("MyApp").enableHiveSupport().getOrCreate()
Then in my main function which is inside an object, i am extending this Trait so my spark session is Initialized in Trait and in my Object (containing main) i am setting this :
spark.conf.set(spark.sql.hive.convertMetastoreParquet, false)
spark.conf.set(mapreduce.input.fileinputformat.input.dir.recursive,true)
spark.conf.set(spark.dynamicAllocation.enabled, true)
spark.conf.set(spark.shuffle.service.enabled, true)
spark.conf.set(spark.dynamicAllocation.minExecutors,40)
So Ideally my App must start with 40 Executors but it is starting and then running Entirely using the Default 2 executors ..
Upvotes: 5
Views: 5023
Reputation: 51
There is nothing unexpected here. Only certain subset of Spark SQL properties (prefixed with spark.sql
) can be set on runtime (see SparkConf
documentation):
Once a SparkConf object is passed to Spark, it is cloned and can no longer be modified by the user. Spark does not support modifying the configuration at runtime.
Remaining options have to be set before SparkContext
is initalized. It means initalizing SparkSession
with SparkContext
:
val conf: SparkConf = ... // Set options here
val sc = SparkContext(conf)
val spark = SparkSession(sc)
with config
method of SparkSession.Builder
and SparkConf
val conf: SparkConf = ... // Set options here
val spark = SparkSession.builder.config(conf).getOrCreate
or key-value pairs:
val spark = SparkSession.builder.config("spark.some.key", "some_value").getOrCreate
This applies in particular to spark.dynamicAllocation.enabled
,
spark.shuffle.service.enabled
and spark.dynamicAllocation.minExecutors
.
mapreduce.input.fileinputformat.input.dir.recursive
from the other hand, is a property of Hadoop configuration, not Spark, and should be set there:
spark.sparkContext.hadoopConfiguration.set("some.hadoop.property", "some_value")
Upvotes: 5