omkarerudkar
omkarerudkar

Reputation: 75

What's the difference between repartition() vs spark.sql.shuffle.partitions

What happens when we repartition data to higher number than spark.sql.shuffle.partitions property? Are these related?

Upvotes: 0

Views: 288

Answers (1)

Artem Astashov
Artem Astashov

Reputation: 726

It depends on which variant of Dataset.repartition you will call.

If you call repartition(partitionExprs: Column*): Dataset[T] - in this case number of partitions will be based on spark.sql.shuffle.partitions parameter.

If you call repartition(numPartitions: Int): Dataset[T] - in this case number of partitions will be based on numPartitions passed parameter.

Upvotes: 1

Related Questions