Reputation: 855
I have this question:
If I perform a partitioning then perform an action such as reduce/fold, does it mean that the partitioning is undone and I will have to do a repartition after the action for better performance?
Upvotes: 0
Views: 147
Reputation: 3217
Once data is partitioned, the partitions are maintained by Spark for further processing including transformation/actions unless you reparation or coalesce.
After partitions are created, each executor would allocate a task to run the transformation/action in a stage for the partition it is assigned to, and partitions (updated) move on from one stage to another stage for any transformations/actions
Upvotes: 0
Reputation:
Actions in Spark return
Nothing (None
in PySpark, void
in Java, Unit
in Scala) for actions used purely for side effects like foreach
.
Local, non-distributed object for other actions.
At the same time actions don't affect immutable objects or which there are called (with exception to possible side effects of caching, checkpointing, caching shuffle files and computing statistics).
Therefore partitioning is not really meaningful concept here.
Results are not Spark distributed data structures so partitioning doesn't apply, sources are not modified (and are descriptions, not containers anyway).
Upvotes: 2