Carl
Carl

Reputation: 233

Spark 2.0 Scala - RDD.toDF()

I am working with Spark 2.0 Scala. I am able to convert an RDD to a DataFrame using the toDF() method.

val rdd = sc.textFile("/pathtologfile/logfile.txt")
val df = rdd.toDF()

But for the life of me I cannot find where this is in the API docs. It is not under RDD. But it is under DataSet (link 1). However I have an RDD not a DataSet.

Also I can't see it under implicits (link 2).

So please help me understand why toDF() can be called for my RDD. Where is this method being inherited from?

Upvotes: 23

Views: 49251

Answers (4)

DanielVL
DanielVL

Reputation: 249

Yes, you should import sqlContext implicits like that:

val sqlContext = //create sqlContext

import sqlContext.implicits._

val df = RDD.toDF()

Before you call to "toDF" in your RDDs

Upvotes: 5

Gautam De
Gautam De

Reputation: 49

I have done just this with Spark 2. it worked.

val orders = sc.textFile("/user/gd/orders")
val ordersDF = orders.toDF()

Upvotes: 1

Raphael Roth
Raphael Roth

Reputation: 27373

It's coming from here:

Spark 2 API

Explanation: if you import sqlContext.implicits._, you have a implicit method to convert RDD to DataSetHolder (rddToDataSetHolder), then you call toDF on the DataSetHolder

Upvotes: 21

user3749126
user3749126

Reputation: 59

Yes I finally found piece of mind, this issue. It was troubling me like hell, this post is a life saver. I was trying to generically load data from log files to a case class object making it mutable List, this idea was to finally convert the list into DF. However as it was mutable and Spark 2.1.1 has changed the toDF implementation, what ever why the list want not getting converted. I finally thought of even covering save the data to file and the load it back using .read. However 5 min back this post had saved my day.

I did the exact same way as described.

after loading the data to mutable list I immediately used

import spark.sqlContext.implicits._
val df = <mutable list object>.toDF 
df.show()

Upvotes: 2

Related Questions