Reputation: 588
I am interested to know the following nitty gritties of spark parallelism and Partitioning
Upvotes: 14
Views: 5838
Reputation: 18003
Not quite the right way to look at it. An Executor holds nothing, it just does work.
A Partition is processed by a Core that has been assigned to an Executor. An Executor typically has 1 core but can have more than 1 such Core.
An App has Actions that translate to 1 or more Jobs.
A Job has Stages (based on Shuffle Boundaries).
Stages have Tasks, the number of these depends on number of Partitions.
Parallel processing of the Partitions depends on number of Cores allocated to Executors.
Spark is scalable in terms of Cores, Memory and Disk. The latter two in relation to your questions means that if the Partitions cannot all fit into Memory on the Worker for your Job, then that Partition or more will spill in its entirety to Disk.
Upvotes: 2