Reputation: 9771
What exactly does spark.sql.shuffle.partitions
refer to? Are we talking of the number of partitions that is the results of a wide transformation, or something that happens in the middle as in some sort of intermediary partitioning before the result partition of the wide transformation?
Because in my understanding, as per a wide transformation we have
Parents RDDs -> shuffle files -> Child RDDs
What does the spark.sql.shuffle.partitions parameter refer to here? The shuffles files or the CHILD RDDs or something else that I ignored?
Upvotes: 1
Views: 3269
Reputation: 74
This is already explained in the official docs:
spark.sql.shuffle.partitions
200 Configures the number of partitions to use when shuffling data for joins or aggregations.
In other words it is the number of partitions of the child Dataset
.
Upvotes: 1