Reputation: 19
I have some data with 10000 rows. I want to split it equally not by any columns. It should be like 2000,2000,2000,2000,2000. We should write it as 2000 each.
Tried with coalesce And also I have tried partition too. but it's not equally distributed.
final.coalesce(4).write.mode('overwrite').option("header", "true")
Upvotes: 0
Views: 34
Reputation: 404
You will have to use repartition instead of coalesce. Coalesce is faster, because it doesn't shuffle, but that can result in not equally distributed partitions, as you noticed.
final = final.repartition(5)
should do the job for the numbers you give.
Upvotes: 1