Reputation: 41
I am a new to Spark(Using Scala), I am trying few things in RDD to DF conversion etc. I have a String variable for Example:
val myString = "apple, boy, cat, dog"
How can I convert myString to org.apache.spark.sql.Row
I have tried new things like below, but When I am trying to print the length of created row i am getting 1(ONE) where I shall get 4.
val row = org.apache.spark.sql.Row.apply(myString)
val row1 = org.apache.spark.sql.Row(myString)
val row2 = org.apache.spark.sql.Row.fromSeq(Seq(myString.split(',')))
Upvotes: 0
Views: 2447
Reputation: 41957
the correct way is
org.apache.spark.sql.Row.fromSeq(myString.split(','))
//res0: org.apache.spark.sql.Row = [apple, boy, cat, dog]
where myString.split(',')
is an Array[String]
and converted implicitly to Seq
and if you want to create a dataframe then
val myString = "apple, boy, cat, dog"
val row2 = sc.parallelize(Seq(org.apache.spark.sql.Row.fromSeq(myString.split(','))))
sqlContext.createDataFrame(row2, StructType(Seq(StructField("name1", StringType, true), StructField("name2", StringType), StructField("name3", StringType), StructField("name4", StringType)))).show(false)
which should give you
+-----+-----+-----+-----+
|name1|name2|name3|name4|
+-----+-----+-----+-----+
|apple| boy | cat | dog |
+-----+-----+-----+-----+
where StructType(Seq(StructField("name1", StringType, true), StructField("name2", StringType), StructField("name3", StringType), StructField("name4", StringType)))
is schema creation.
Upvotes: 3