AJm
AJm

Reputation: 1023

SparkSession not picking up Runtime Configuration

In my application I'm creating a SparkSession object and then trying to Read my properties file and setting the properties at runtime. But it is not picking up the properties that I am passing at runtime.

I am submitting my App in YARN Cluster Mode

This is my inital Spark session object which I am creating in a Trait

val spark = SparkSession.builder().appName("MyApp").enableHiveSupport().getOrCreate()

Then in my main function which is inside an object, i am extending this Trait so my spark session is Initialized in Trait and in my Object (containing main) i am setting this :

spark.conf.set(spark.sql.hive.convertMetastoreParquet, false)
spark.conf.set(mapreduce.input.fileinputformat.input.dir.recursive,true)
spark.conf.set(spark.dynamicAllocation.enabled, true)
spark.conf.set(spark.shuffle.service.enabled, true)
spark.conf.set(spark.dynamicAllocation.minExecutors,40)

So Ideally my App must start with 40 Executors but it is starting and then running Entirely using the Default 2 executors ..

Upvotes: 5

Views: 5023

Answers (1)

user9162207
user9162207

Reputation: 51

There is nothing unexpected here. Only certain subset of Spark SQL properties (prefixed with spark.sql) can be set on runtime (see SparkConf documentation):

Once a SparkConf object is passed to Spark, it is cloned and can no longer be modified by the user. Spark does not support modifying the configuration at runtime.

Remaining options have to be set before SparkContext is initalized. It means initalizing SparkSession with SparkContext:

val conf: SparkConf = ...   // Set options here
val sc  = SparkContext(conf)
val spark  = SparkSession(sc)

with config method of SparkSession.Builder and SparkConf

val conf: SparkConf = ...   // Set options here
val spark = SparkSession.builder.config(conf).getOrCreate

or key-value pairs:

val spark = SparkSession.builder.config("spark.some.key", "some_value").getOrCreate

This applies in particular to spark.dynamicAllocation.enabled, spark.shuffle.service.enabled and spark.dynamicAllocation.minExecutors.

mapreduce.input.fileinputformat.input.dir.recursive from the other hand, is a property of Hadoop configuration, not Spark, and should be set there:

spark.sparkContext.hadoopConfiguration.set("some.hadoop.property", "some_value")

Upvotes: 5

Related Questions