user2200660
user2200660

Reputation: 1271

Merge two RDDs in Spark Scala

I have two RDDs.

rdd1 = (String, String)

key1, value11
key2, value12
key3, value13

rdd2 = (String, String)

key2, value22
key3, value23
key4, value24

I need to form another RDD with merged rows from rdd1 and rdd2, the output should look like:

key2, value12 ; value22
key3, value13 ; value23

So, basically it's nothing but taking the intersection of the keys of rdd1 and rdd2 and then join their values. ** The values should be in order i.e. value(rdd1) + value(rdd2) and not reverse.

Upvotes: 3

Views: 9823

Answers (2)

Angelo Genovese
Angelo Genovese

Reputation: 3398

I think this may be what you are looking for:

join(otherDataset, [numTasks])  

When called on datasets of type (K, V) and (K, W), returns a dataset of (K, (V, W)) pairs with all pairs of elements for each key. Outer joins are supported through leftOuterJoin, rightOuterJoin, and fullOuterJoin.

See the associated section of the docs

Upvotes: 4

Related Questions