Set spark.driver.memory for Spark running inside a web application

I have a REST API in Scala Spray that triggers Spark jobs like the following:

path("vectorize") {
        get {
          parameter('apiKey.as[String]) { (apiKey) =>
            if (apiKey == API_KEY) {
              MoviesVectorizer.calculate() // Spark Job run in a Thread (returns Future)
              complete("Ok")
            } else {
              complete("Wrong API KEY")
            }

          }
        }
      }

I'm trying to find the way to specify Spark driver memory for the jobs. As I found, configuring driver.memory from within the application code doesn't effect anything.

The whole web application along with the Spark is packaged in a fat Jar. I run it by running

java -jar app.jar

Thus, as I understand, spark-submit is not relevant here (or is it?). So, I can not specify --driver-memory option when running the app.

Is there any way to set the driver memory for Spark within the web app?

Here's my current Spark configuration:

val spark: SparkSession = SparkSession.builder()
                        .appName("Recommender")
                        .master("local[*]")
                        .config("spark.mongodb.input.uri", uri)
                        .config("spark.mongodb.output.uri", uri)
                        .config("spark.mongodb.keep_alive_ms", "100000")
                        .getOrCreate()

spark.conf.set("spark.executor.memory", "10g")
val sc = spark.sparkContext
sc.setCheckpointDir("/tmp/checkpoint/")
val sqlContext = spark.sqlContext

As it is said in the documentation, Spark UI Environment tab shows only variables that are effected by the configuration. Everything I set is there - apart from spark.executor.memory.

Upvotes: 1

Views: 1964

Answers (1)

Alper t. Turker
Alper t. Turker

Reputation: 35219

This happens because you use local mode. In local mode there is no real executor - all Spark components run in a single JVM, with single heap configuration, so executor specific configuration doesn't matter.

spark.executor options are applicable only when applications is submitted to a cluster.

Also, Spark supports only a single application per JVM instance. This means that all core Spark properties, will be applied only when SparkContext is initialized, and persist as long as context (not SparkSession) is kept alive. Since SparkSession initializes SparkContext, no additional "core" settings will can applied after getOrCreate.

This means that all "core" options should be provided using config method of the SparkSession.builder.

If you're looking for alternatives to embedding you check an exemplary answer to Best Practice to launch Spark Applications via Web Application? by T. Gawęda.


Note: Officially Spark doesn't support applications running outside spark-submit and there are some elusive bugs related to that.

Upvotes: 2

Related Questions