lu5er
lu5er

Reputation: 3564

Notebook vs spark-submit

I'm very new to PySpark.

I am running a script (mainly creating a tfidf and predicting 9 categorical columns with it) in Jupyter Notebook. It is taking some 5 mins when manually executing all cells. When running the same script from spark-submit it is taking some 45 mins. What is happening?

Also the same thing happens (the excess time) if I run the code using python from terminal.

I am also setting the configuration in the script as

conf = SparkConf().set('spark.executor.memory', '45G').set('spark.driver.memory', '80G').set('spark.driver.maxResultSize', '20G')

Any help is appreciated. Thanks in advance.

Upvotes: 1

Views: 2624

Answers (2)

Quentin
Quentin

Reputation: 11

I had the same problem, but to initialize my spark variable I was using this line :

spark = SparkSession.builder.master("local[1]").appName("Test").getOrCreate()

The problem is that "local[X]", is equivalent to say that spark will do the operations on the local machine, on X cores. So you have to optimize X with the number of cores available on your machine.

To use it with a yarn cluster, you have to put "yarn".

There is many others possibilities listed here : https://spark.apache.org/docs/latest/submitting-applications.html

Upvotes: 1

Neeraj Bhadani
Neeraj Bhadani

Reputation: 3110

There are various ways to run your Spark code like you have mentioned few Notebook, Pyspark and Spark-submit.

  1. Regarding Jupyter Notebook or pyspark shell.

While you are running your code in Jupyter notebook or pyspark shell it might have set some default values for executor memory, driver memory, executor cores etc.

  1. Regarding spark-submit.

However, when you use Spark-submit these values could be different by default. So the best way would be to pass these values as flags while submitting the pyspark application using "spark-submit" utility.

  1. Regarding the configuration object which you have created can pe be passes while creating the Spark Context (sc).

sc = SparkContext(conf=conf)

Hope this helps.

Regards,

Neeraj

Upvotes: 6

Related Questions