Reputation: 81
I am new to spark/scala.
val First: RDD[((Short, String), (Int, Double, Int))]
This is structure of RDD. I want to modified this sturcture something like bellow:
val First: RDD[(Short, String , Int, Double, Int)]
Because I am having another RDD with different Structure and I want to UNION both this RDD. (Structure must be same in UNION operation).
Please Suggest me an option.
Upvotes: 2
Views: 178
Reputation: 73444
Just map your data, like this:
First.map{ case ( (x, y), (k, z, w) ) => (x, y, k, z, w) }
and in order to write this map function, you have to check the format of your RDD, ((Short, String), (Int, Double, Int))
, which is what I wrote as (x, y), (k, z, w)
, and then write the format you want in the right side of =>
.
Edit for the comment:
As Map will iterate data one by one
spark applies the transformation only when an action occurs, so map()
works really well, in a distributed manner. Every partition will apply the map function in its data.
That's a not very costly operation though, so don't focus on that, focus on your join, which is the heavy operation. A map function should be something cheap, if you have the corresponding resources in your cluster, for your amount of data.
Upvotes: 1