Reputation: 3470
I have an RDD of Rows called RowRDD. I am simply trying to convert into DataFrame. From the examples I have seen on the internet from various places, I am seeing that I shoudl be trying RowRDD.toDF() I am getting the error :
value toDF is not a member of org.apache.spark.rdd.RDD[org.apache.spark.sql.Row]
Upvotes: 1
Views: 2844
Reputation: 330193
It doesn't work because Row
is not a Product
type and createDataFrame
with as single RDD
argument is defined only for RDD[A]
where A <: Product
.
If you want to use RDD[Row]
you have to provide a schema as the second argument. If you think about it is should be obvious. Row
is just just a container of Any
and as such it doesn't provide enough information for schema inference.
Assuming this is the same RDD
as defined in your previous question then schema is easy to generate:
import org.apache.spark.sql.types._
import org.apache.spark.rdd.RD
val rowRdd: RDD[Row] = ???
val schema = StructType(
(1 to rowRdd.first.size).map(i => StructField(s"_$i", StringType, false))
)
val df = sqlContext.createDataFrame(rowRdd, schema)
Upvotes: 6