Michael
Michael

Reputation: 42050

What a data structure is DataFrame in Spark?

This is a follow-up to my previous question.
Row is an ordered set of key value pairs. DataFrame is a collection of Rows.
What a data structure is DataFrame actually ? Is it a list, set, or other "collection" ? Is it a relation as in SQL ?

Upvotes: 0

Views: 655

Answers (2)

Avishek Bhattacharya
Avishek Bhattacharya

Reputation: 6974

Although Dataframe is an abstraction over RDD, the internal representation of Dataframe is quite different than RDD.

RDD is represented as a JAVA objects and uses JVM for all operations. However Dataframe is represented in tungsten.

Here is an excellent article which elaborate how dataframes are represented in tungsten.

Upvotes: 1

OneCricketeer
OneCricketeer

Reputation: 191728

It's an abstraction over a RDD[Row], or Dataset[Row] in Spark2, with a defined schema set through a series Column classes

Is it a list, set, or other "collection" ?

Not in Java terms of those words. Similar to how RDD is none of those, but rather a "lazy collection"

Is it a relation as in SQL ?

You're welcome to run SparkSQL over a Dataframe, but it's a table. Relations are optional

Upvotes: 1

Related Questions