Vajra
Vajra

Reputation: 461

How to convert DataFrame to RDD in Scala?

Can someone please share how one can convert a dataframe to an RDD?

Upvotes: 45

Views: 108912

Answers (3)

Ishan Kumar
Ishan Kumar

Reputation: 2082

I was just looking for my answer and found this post.

Jean's answer to absolutely correct,adding on that "df.rdd" will return a RDD[Rows]. I need to apply split() once i get RDD. For that we need to convert RDD[Row} to RDD[String]

val opt=spark.sql("select tags from cvs").map(x=>x.toString()).rdd

Upvotes: 4

Random Certainty
Random Certainty

Reputation: 475

Use df.map(row => ...) to convert the dataframe to a RDD if you want to map a row to a different RDD element. For example

df.map(row => (row(1), row(2)))

gives you a paired RDD where the first column of the df is the key and the second column of the df is the value.

Upvotes: 3

Jean Logeart
Jean Logeart

Reputation: 53839

Simply:

val rows: RDD[Row] = df.rdd

Upvotes: 71

Related Questions