Reputation: 1082
This is the already asked question but I could not understand the answers properly.
I have two RDDs with same number of columns and same number of records
RDD1(col1,col2,col3)
and
RDD2(colA,colB,colC)
I need to join them as following :
RDD_FINAL(col1,col2,col3,colA,colB,colC)
There is no key
to perform a join between records but they are in order which means the first record of RDD1 is corresponded to first record of RDD2.
Upvotes: 0
Views: 1408
Reputation: 2333
Adding code snippet for Alfilercio's example.
JavaRDD<col1,col2,col3> rdd1 = ...
JavaPairRDD<Long, Tuple3<col1,col2,col3>> pairRdd1 = rdd1.zipWithUniqueId().mapToPair(pair -> new Tuple2<>(pair._2(),pair._1());
JavaRDD<colA,colB,colC> rdd2 = ...
JavaPairRDD<Long, Tuple3<colA,colB,colC>> pairRdd2 = rdd2.zipWithUniqueId().mapToPair(pair -> new Tuple2<>(pair._2(),pair._1());
JavaRDD<Tuple2<Tuple3<col1, col2, col3>, Tuple3<colA,colB,colC>>> mappedRdd = pairRdd1.join(pairRdd2).map(pair -> pair._2());
Upvotes: 1
Reputation: 1118
You can use the zipWithIndex
method to add the index of the row as a key to both RDD's, and join by it by the key.
Upvotes: 1