Modified Structure of RDD in Spark

Question

I am new to spark/scala.

val First: RDD[((Short, String), (Int, Double, Int))]

This is structure of RDD. I want to modified this sturcture something like bellow:

val First: RDD[(Short, String , Int, Double, Int)]

Because I am having another RDD with different Structure and I want to UNION both this RDD. (Structure must be same in UNION operation).

Please Suggest me an option.

gsamaras · Accepted Answer

Just map your data, like this:

First.map{ case ( (x, y), (k, z, w) ) => (x, y, k, z, w) }

and in order to write this map function, you have to check the format of your RDD, ((Short, String), (Int, Double, Int)), which is what I wrote as (x, y), (k, z, w), and then write the format you want in the right side of =>.

Edit for the comment:

As Map will iterate data one by one

spark applies the transformation only when an action occurs, so map() works really well, in a distributed manner. Every partition will apply the map function in its data.

That's a not very costly operation though, so don't focus on that, focus on your join, which is the heavy operation. A map function should be something cheap, if you have the corresponding resources in your cluster, for your amount of data.

Modified Structure of RDD in Spark

Answers (1)

Related Questions