How to repartition equaly by using pyspark sql

I have some data with 10000 rows. I want to split it equally not by any columns. It should be like 2000,2000,2000,2000,2000. We should write it as 2000 each.

Tried with coalesce And also I have tried partition too. but it's not equally distributed.

final.coalesce(4).write.mode('overwrite').option("header", "true")

Upvotes: 0

Views: 34

Answers (1)

harppu
harppu

Reputation: 404

You will have to use repartition instead of coalesce. Coalesce is faster, because it doesn't shuffle, but that can result in not equally distributed partitions, as you noticed.

final = final.repartition(5)

should do the job for the numbers you give.

Upvotes: 1

Related Questions