Converting Dataframe to RDD reduces partitions

Question

In our code, Dataframe was created as :

DataFrame DF = hiveContext.sql("select * from table_instance");

When I convert my dataframe to rdd and try to get its number of partitions as

RDD newRDD = Df.rdd();
System.out.println(newRDD.getNumPartitions());

It reduces the number of partitions to 1(1 is printed in the console). Originally my dataframe has 102 partitions .

UPDATE:

While reading , I repartitoned the dataframe :

DataFrame DF = hiveContext.sql("select * from table_instance").repartition(200);

and then converted to rdd , so it gave me 200 partitions only. Does

JavaSparkContext

has a role to play in this? When we convert a dataframe to rdd , is default minimum partitions flag also considered at the spark context level?

UPDATE:

I made a seperate sample program in which I read the exact same table into dataframe and converted to rdd. No extra stage was created for RDD conversion and the partition count was also correct. I am now wondering what different am I doing in my main program.

Please let me know if my understanding is wrong here.

Converting Dataframe to RDD reduces partitions

Answers (1)

Related Questions