user6882757
user6882757

Reputation:

What is difference between RDD and Dataframe in Spark

I went throguh the link What's the difference between RDD and Dataframe in Spark?

Upvotes: 0

Views: 265

Answers (1)

Salim
Salim

Reputation: 2178

For structured data you needn't use RDD. You can use Dataframe or Dataset for Scala and Java. For Python you need to use Dataframe. Please see official guide.

For unstructured data you will still need to use RDD.

Dataframe generally provides the fastest performance (as per Mathei's book).

The dataframe syntax (using Spark SQL) can support almost all of SQL like functions. You can also use Pandas, please see Pandas guide.

Project Koala enables using panda's syntax on Spark. I will prefer using this over Pandas. Here is the Koala guide.

Upvotes: 1

Related Questions