Reputation: 33
I have this code
df.coalesce(40)
print(" after coalisce getting nb partition " + str(df.rdd.getNumPartitions()))
It is not print 40, is there something I am doing wrong?
Upvotes: 1
Views: 1572
Reputation: 772
I would suggest you to go through the spark architecture first and then try and understand the concept of immutable objects. This will help you to get better understanding of the responses provided above by other users.
Upvotes: 0
Reputation: 452
Try doing this instead:
df_new=df.coalesce(40)
print(" after coalisce getting nb partition " + str(df_new.rdd.getNumPartitions()))
Coalesce returns a new rdd rather than doing in-place changes.
Upvotes: 4
Reputation: 1416
The coalesce method returns you a transformed Dataframe. It doesn't modify the original Dataframe. You have to get the number of partitions after applying the coalesce transformation.
For example, on a spark shell running on 8 core machine returns the following output.
scala> df.rdd.getNumPartitions
res3: Int = 8
After you apply the coalesce, you get the required output
scala> df.coalesce(1).rdd.getNumPartitions
res1: Int = 1
Upvotes: 3