Arvind Kumar
Arvind Kumar

Reputation: 1335

Memory parameters for Spark-submit command

How to calculate optimal memory setting for spark-submit command ?

I am bringing 4.5 GB data in Spark from Oracle and performing some transformation like join with a Hive table and writing it back to Oracle. My question is how to come up spark-submit command with optimal memory parameters.

spark-submit --master yarn-cluster --driver-cores 2 \
--driver-memory 2G --num-executors 10 \
--executor-cores 5 --executor-memory 2G \
--class com.spark.sql.jdbc.SparkDFtoOracle2 \
Spark-hive-sql-Dataframe-0.0.1-SNAPSHOT-jar-with-dependencies.jar

How to calculate, what should be the driver memory, how much driver/executor memory required, how many cores required etc. ?

Upvotes: 0

Views: 3429

Answers (1)

ShirishT
ShirishT

Reputation: 232

That is, in general, a complex question with no silver bullet answer. The optimal choice depends not only on your data characteristics and the type of operations but also on the system behavior (Spark optimizer etc.). Some useful tips can be found here

Upvotes: 1

Related Questions