Deric Dominic
Deric Dominic

Reputation: 57

How to convert DataFrame or RDD[object] to Array[Object] in spark?

I am currently using spark streaming and spark sql for my current project. Is there a way to convert Array[Object] to either RDD[object] or DataFrame? I am doing something as below:

val myData = myDf.distinct()
                 .collect()
                 .map{ row => 
                   new myObject(row.getAs[String]("id"), row.getAs[String]("name"))
                 }

The myData on the code snippet above will be Array[myObject]. How to I make it to RDD[myObject] or directly to DataFrame for next execution?

Upvotes: 0

Views: 3993

Answers (2)

Ton Torres
Ton Torres

Reputation: 1529

import org.apache.spark.sql.Row

case class myObject(id:String, name:String)

val myData = myDf.distinct.map {
  case Row(id:String, name:String) => myObject(id,name)
}

Upvotes: 2

Deric Dominic
Deric Dominic

Reputation: 57

I think I get to parse it to RDD[myObject]. I hope is the right way to do it.

val myData = myDf.distinct()
             .collect()
             .map{ row => 
               new myObject(row.getAs[String]("id"), row.getAs[String]("name"))
             }
val myDataRDD = rdd.SparkContext.parallelize(myData) // since this code snippet is inside a foreachRDD clause.

Upvotes: 1

Related Questions