MaatDeamon
MaatDeamon

Reputation: 9771

What does spark.sql.shuffle.partitions exactly refer to?

What exactly does spark.sql.shuffle.partitions refer to? Are we talking of the number of partitions that is the results of a wide transformation, or something that happens in the middle as in some sort of intermediary partitioning before the result partition of the wide transformation?

Because in my understanding, as per a wide transformation we have

Parents RDDs -> shuffle files -> Child RDDs

What does the spark.sql.shuffle.partitions parameter refer to here? The shuffles files or the CHILD RDDs or something else that I ignored?

Upvotes: 1

Views: 3269

Answers (1)

user10407081
user10407081

Reputation: 74

This is already explained in the official docs:

spark.sql.shuffle.partitions 200 Configures the number of partitions to use when shuffling data for joins or aggregations.

In other words it is the number of partitions of the child Dataset.

Upvotes: 1

Related Questions