Reputation: 57
I am currently using spark streaming
and spark sql
for my current project. Is there a way to convert Array[Object]
to either RDD[object]
or DataFrame
? I am doing something as below:
val myData = myDf.distinct()
.collect()
.map{ row =>
new myObject(row.getAs[String]("id"), row.getAs[String]("name"))
}
The myData on the code snippet above will be Array[myObject]
. How to I make it to RDD[myObject]
or directly to DataFrame
for next execution?
Upvotes: 0
Views: 3993
Reputation: 1529
import org.apache.spark.sql.Row
case class myObject(id:String, name:String)
val myData = myDf.distinct.map {
case Row(id:String, name:String) => myObject(id,name)
}
Upvotes: 2
Reputation: 57
I think I get to parse it to RDD[myObject]. I hope is the right way to do it.
val myData = myDf.distinct()
.collect()
.map{ row =>
new myObject(row.getAs[String]("id"), row.getAs[String]("name"))
}
val myDataRDD = rdd.SparkContext.parallelize(myData) // since this code snippet is inside a foreachRDD clause.
Upvotes: 1