Reputation: 3587
Why is it possible to convert an rdd[int] into a dataframe using the implicit method
import sqlContext.implicits._
//Concatenate rows
val rdd1 = sc.parallelize(Array(4,5,6)).toDF()
rdd1.show()
rdd1: org.apache.spark.sql.DataFrame = [_1: int]
+---+
| _1|
+---+
| 4|
| 5|
| 6|
+---+
but rdd[Double] is throwing an error:
val rdd2 = sc.parallelize(Array(1.1,2.34,3.4)).toDF()
error: value toDF is not a member of org.apache.spark.rdd.RDD[Double]
Upvotes: 1
Views: 756
Reputation: 330283
Spark 2.x
In Spark 2.x toDF
uses SparkSession.implicits
and provides rddToDatasetHolder
and localSeqToDatasetHolder
for any type with Encoder
so with
val spark: SparkSession = ???
import spark.implicits._
both:
Seq(1.1, 2.34, 3.4).toDF()
and
sc.parallelize(Seq(1.1, 2.34, 3.4)).toDF()
are valid.
Spark 1.x
It is not possible. Excluding Product
types SQLContext
provides implicit conversions only for RDD[Int]
(intRddToDataFrameHolder
), RDD[Long]
(longRddToDataFrameHolder
) and RDD[String]
(stringRddToDataFrameHolder
). You can always map
to RDD[(Double,)]
first:
sc.parallelize(Seq(1.1, 2.34, 3.4)).map(Tuple1(_)).toDF()
Upvotes: 4