Reputation: 580
I've seen databricks examples that use the partionBy
method. But partitions are recommended to be 128MB. I'd think there was a way to basically achieve that as closely as possible? Take the total size, divide it by 128mb, then partition by a number of partitions rather than by a dimension.
Any suggestions for how this is achieved would be appreciated.
Upvotes: 2
Views: 761
Reputation: 2334
The setting spark.sql.files.maxPartitionBytes
has indeed impact on the max size of the partitions when reading the data on the Spark cluster.By using this configuration we can control the partition based on the size of the data .
Upvotes: 2