momo
momo

Reputation: 33

Spark coalesce not reducing partitions count

I have this code

df.coalesce(40)

print(" after coalisce getting nb partition " + str(df.rdd.getNumPartitions()))

It is not print 40, is there something I am doing wrong?

Upvotes: 1

Views: 1572

Answers (3)

Prashant
Prashant

Reputation: 772

I would suggest you to go through the spark architecture first and then try and understand the concept of immutable objects. This will help you to get better understanding of the responses provided above by other users.

Upvotes: 0

Kuldip Puri Tejaswi
Kuldip Puri Tejaswi

Reputation: 452

Try doing this instead:

df_new=df.coalesce(40)
print(" after coalisce getting nb partition " + str(df_new.rdd.getNumPartitions()))

Coalesce returns a new rdd rather than doing in-place changes.

Upvotes: 4

Constantine
Constantine

Reputation: 1416

The coalesce method returns you a transformed Dataframe. It doesn't modify the original Dataframe. You have to get the number of partitions after applying the coalesce transformation.

For example, on a spark shell running on 8 core machine returns the following output.

 scala> df.rdd.getNumPartitions
 res3: Int = 8

After you apply the coalesce, you get the required output

 scala> df.coalesce(1).rdd.getNumPartitions
 res1: Int = 1

Upvotes: 3

Related Questions