TSAR
TSAR

Reputation: 683

Setting PySpark executor.memory and executor.core within Jupyter Notebook

I am initializing PySpark from within a Jupyter Notebook as follows:

from pyspark import SparkContext
#
conf = SparkConf().setAppName("PySpark-testing-app").setMaster("yarn")
conf = (conf.set("deploy-mode","client")
       .set("spark.driver.memory","20g")
       .set("spark.executor.memory","20g")
       .set("spark.driver.cores","4")
       .set("spark.num.executors","6")
       .set("spark.executor.cores","4"))

sc = SparkContext(conf=conf)
sqlContext = SQLContext.getOrCreate(sc)

However, when I launch YARN GUI and look into "RUNNING Applications" I see my session being allocated with 1 container, 1 vCPU, and 1GB of RAM, i.e. the default values! Can I get the desired, passing values as listed above?

Upvotes: 5

Views: 9447

Answers (2)

Oresto
Oresto

Reputation: 335

Execute

    %%configure -f
{
    "driverMemory" : "20G",
    "executorMemory": "20G"
}

On the top of all cells (before Spark initializes)

Upvotes: 0

Jack_H
Jack_H

Reputation: 51

Jupyter notebook will launch the pyspark with yarn-client mode, the driver memory and some configs cannot be setted with class 'sparkConf'. you must set it in command line.

Take a look at official doc's explains at memory's setting:

Note: In client mode, this config must not be set through the SparkConf directly in your application, because the driver JVM has already started at that point. Instead, please set this through the --driver-memory command line option or in your default properties file.

there is another way that can make it.

import os
memory = '20g'
pyspark_submit_args = ' --driver-memory ' + memory + ' pyspark-shell'
os.environ["PYSPARK_SUBMIT_ARGS"] = pyspark_submit_args

So, other config should be taked with same way like above.

Upvotes: 5

Related Questions