Reputation: 29477
I'm trying to take a hardcoded String and turn it into a 1-row Spark DataFrame (with a single column of type StringType
) such that:
String fizz = "buzz"
Would result with a DataFrame whose .show()
method looks like:
+-----+
| fizz|
+-----+
| buzz|
+-----+
My best attempt thus far has been:
val rawData = List("fizz")
val df = sqlContext.sparkContext.parallelize(Seq(rawData)).toDF()
df.show()
But I get the following compiler error:
java.lang.ClassCastException: org.apache.spark.sql.types.ArrayType cannot be cast to org.apache.spark.sql.types.StructType
at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:413)
at org.apache.spark.sql.SQLImplicits.rddToDataFrameHolder(SQLImplicits.scala:155)
Any ideas as to where I'm going awry? Also, how do I set "buzz"
as the row value for the fizz
column?
Trying:
sqlContext.sparkContext.parallelize(rawData).toDF()
I get a DF that looks like:
+----+
| _1|
+----+
|buzz|
+----+
Upvotes: 7
Views: 29275
Reputation: 1884
In Java, the following works:
List<String> textList = Collections.singletonList("yourString");
SQLContext sqlContext = new SQLContext(sparkContext);
Dataset<Row> data = sqlContext
.createDataset(textList, Encoders.STRING())
.withColumnRenamed("value", "text");
Upvotes: 0
Reputation:
Try:
sqlContext.sparkContext.parallelize(rawData).toDF()
In 2.0 you can:
import spark.implicits._
rawData.toDF
Optionally provide a sequence of names for toDF
:
sqlContext.sparkContext.parallelize(rawData).toDF("fizz")
Upvotes: 9