Gakuo
Gakuo

Reputation: 855

Apache Spark partitioning

I have this question:

If I perform a partitioning then perform an action such as reduce/fold, does it mean that the partitioning is undone and I will have to do a repartition after the action for better performance?

Upvotes: 0

Views: 147

Answers (2)

skjagini
skjagini

Reputation: 3217

Once data is partitioned, the partitions are maintained by Spark for further processing including transformation/actions unless you reparation or coalesce.

After partitions are created, each executor would allocate a task to run the transformation/action in a stage for the partition it is assigned to, and partitions (updated) move on from one stage to another stage for any transformations/actions

Upvotes: 0

user10968997
user10968997

Reputation:

Actions in Spark return

  • Nothing (None in PySpark, void in Java, Unit in Scala) for actions used purely for side effects like foreach.

  • Local, non-distributed object for other actions.

At the same time actions don't affect immutable objects or which there are called (with exception to possible side effects of caching, checkpointing, caching shuffle files and computing statistics).

Therefore partitioning is not really meaningful concept here.

Results are not Spark distributed data structures so partitioning doesn't apply, sources are not modified (and are descriptions, not containers anyway).

Upvotes: 2

Related Questions