D V N
D V N

Reputation: 41

Creating Spark Row from CSV String

I am a new to Spark(Using Scala), I am trying few things in RDD to DF conversion etc. I have a String variable for Example:

val myString = "apple, boy, cat, dog"

How can I convert myString to org.apache.spark.sql.Row

I have tried new things like below, but When I am trying to print the length of created row i am getting 1(ONE) where I shall get 4.

val row = org.apache.spark.sql.Row.apply(myString)

val row1 = org.apache.spark.sql.Row(myString) 

val row2 = org.apache.spark.sql.Row.fromSeq(Seq(myString.split(',')))

Upvotes: 0

Views: 2447

Answers (1)

Ramesh Maharjan
Ramesh Maharjan

Reputation: 41957

the correct way is

org.apache.spark.sql.Row.fromSeq(myString.split(','))
//res0: org.apache.spark.sql.Row = [apple, boy, cat, dog]

where myString.split(',') is an Array[String] and converted implicitly to Seq

and if you want to create a dataframe then

val myString = "apple, boy, cat, dog"

val row2 = sc.parallelize(Seq(org.apache.spark.sql.Row.fromSeq(myString.split(','))))
sqlContext.createDataFrame(row2, StructType(Seq(StructField("name1", StringType, true), StructField("name2", StringType), StructField("name3", StringType), StructField("name4", StringType)))).show(false)

which should give you

+-----+-----+-----+-----+
|name1|name2|name3|name4|
+-----+-----+-----+-----+
|apple| boy | cat | dog |
+-----+-----+-----+-----+

where StructType(Seq(StructField("name1", StringType, true), StructField("name2", StringType), StructField("name3", StringType), StructField("name4", StringType))) is schema creation.

Upvotes: 3

Related Questions