Reputation: 18003
Leaving aside the database connection aspects that get discussed with mapPartitions for RDDs, and noting that for me the Dataframe under the hood is harder to follow than the RDD abstraction:
Upvotes: 1
Views: 719
Reputation: 869
From Spark 2.0 onwards the Dataframe is a Dataset organized into named columns. To answer your question, there is no need for Dataframes to be converted back to RDDs to achieve performance and optimization, because, Datasets and Dataframes themselves are very efficient compared to primitive RDDs due to below reasons.
Upvotes: 1