padmanabh pande
padmanabh pande

Reputation: 427

spark.executor.cores Vs spark.executor.instance which one should I increase?

In order to parallelize spark jobs, both the number of cores and the number of executor instances can be increased. What is the trade-off here and how should one pick the actual values of both the configs?

Upvotes: 2

Views: 2880

Answers (1)

QuickSilver
QuickSilver

Reputation: 4045

  • Advantages of increasing no. of cores over no. of executor are same as advantages of multi threaded over multi processes.
  • No. of cores increase no. of threads for each executor and no. of executors will increase no. of Java process(i.e. Over all Spark executors on a cluster) .
  • If you are looking to perform multiple operations in parallel on same Dataset/Dataframe then increase no .of cores per executor
  • But if you are looking to perform process large amount of Dataset/Dataframe with relatively less parallelism then you can partition your day on key column and Spark will process your data on respective partition allocated executor.

I recommend that you read this blog post from Cloudera.

Bench marking you PySpark job by varying no. executors against no. of executor thread is the best way to come up with the right configuration for your application.

Upvotes: 3

Related Questions