How to partition a table in Databricks by data-size/row count not by column

Question

I've seen databricks examples that use the partionBy method. But partitions are recommended to be 128MB. I'd think there was a way to basically achieve that as closely as possible? Take the total size, divide it by 128mb, then partition by a number of partitions rather than by a dimension.

Any suggestions for how this is achieved would be appreciated.

How to partition a table in Databricks by data-size/row count not by column

Answers (1)

Related Questions