Reputation: 15
10 Node
cluster, each machine has 16 cores
and 126.04 GB
of RAM
Application input dataset is around 1TB
with 10-15 files
and there is some aggregation(groupBy
)
Job will run using Yarn as resource scheduler
My Question how to pick num-executors, executor-memory, executor-core, driver-memory, driver-cores?
Upvotes: 1
Views: 157
Reputation: 5032
I tend to use this tool - http://spark-configuration.luminousmen.com/ , for profiling my Spark Jobs , the process does take some hit and try , but it helps in the longer run
Additionally you can understand how Spark Memory works - https://luminousmen.com/post/dive-into-spark-memory
Upvotes: 1