MetallicPriest
MetallicPriest

Reputation: 30825

How to join two RDDs with mutually exclusive keys

Say I have two Spark RDDs with the following values

x = [(1, 3), (2, 4)]

and

y = [(3, 5), (4, 7)]

and I want to have

z = [(1, 3), (2, 4), (3, 5), (4, 7)]

How can I achieve this. I know you can use outerJoin followed by map to achieve this, but is there a more direct way for this.

Upvotes: 0

Views: 155

Answers (2)

maasg
maasg

Reputation: 37435

rdd.union(otherRDD) gives you the union of the two rdds as expected in the question

x.union(y)

Upvotes: 6

Cory Kramer
Cory Kramer

Reputation: 118001

You can just use the + operator. In the context of lists, this is a concatenate operation.

>>> x = [(1, 3), (2, 4)]
>>> y = [(3, 5), (4, 7)]
>>> z = x + y
>>> z
[(1, 3), (2, 4), (3, 5), (4, 7)]

Upvotes: 0

Related Questions