Converting RDD into a dataframe int vs Double

Question

Why is it possible to convert an rdd[int] into a dataframe using the implicit method

import sqlContext.implicits._

//Concatenate rows
val rdd1 = sc.parallelize(Array(4,5,6)).toDF()
rdd1.show()

rdd1: org.apache.spark.sql.DataFrame = [_1: int]
+---+
| _1|
+---+
|  4|
|  5|
|  6|
+---+

but rdd[Double] is throwing an error:

val rdd2 = sc.parallelize(Array(1.1,2.34,3.4)).toDF()
error: value toDF is not a member of org.apache.spark.rdd.RDD[Double]

zero323 · Accepted Answer

Spark 2.x

In Spark 2.x toDF uses SparkSession.implicits and provides rddToDatasetHolder and localSeqToDatasetHolder for any type with Encoder so with

val spark: SparkSession = ???
import spark.implicits._

both:

Seq(1.1, 2.34, 3.4).toDF()

and

sc.parallelize(Seq(1.1, 2.34, 3.4)).toDF()

are valid.

Spark 1.x

It is not possible. Excluding Product types SQLContext provides implicit conversions only for RDD[Int] (intRddToDataFrameHolder), RDD[Long] (longRddToDataFrameHolder) and RDD[String] (stringRddToDataFrameHolder). You can always map to RDD[(Double,)] first:

sc.parallelize(Seq(1.1, 2.34, 3.4)).map(Tuple1(_)).toDF()

Converting RDD into a dataframe int vs Double

Answers (1)

Related Questions