Bishamon Ten
Bishamon Ten

Reputation: 588

Apache Spark: How many partitions can a executor hold in spark.? How are the partitions distributed (mechanism) among the executors?

I am interested to know the following nitty gritties of spark parallelism and Partitioning

  1. How many partitions can a executor hold in spark?
  2. How are the partitions distributed (mechanism) among the executors?
  3. How to set the size of the partition. Would like to know the relevant the config parameter.
  4. Does executor store all the partitions in memory? If not when spilled to disk does it spill entire partition to disk or a part of partition to disk? 5 When there are 2 cores per executor but there are 5 partition in that executor then

Upvotes: 14

Views: 5838

Answers (1)

Ged
Ged

Reputation: 18003

Not quite the right way to look at it. An Executor holds nothing, it just does work.

  • A Partition is processed by a Core that has been assigned to an Executor. An Executor typically has 1 core but can have more than 1 such Core.

  • An App has Actions that translate to 1 or more Jobs.

  • A Job has Stages (based on Shuffle Boundaries).

  • Stages have Tasks, the number of these depends on number of Partitions.

  • Parallel processing of the Partitions depends on number of Cores allocated to Executors.

Spark is scalable in terms of Cores, Memory and Disk. The latter two in relation to your questions means that if the Partitions cannot all fit into Memory on the Worker for your Job, then that Partition or more will spill in its entirety to Disk.

Upvotes: 2

Related Questions