Reputation: 30825
Say I have two Spark RDDs with the following values
x = [(1, 3), (2, 4)]
and
y = [(3, 5), (4, 7)]
and I want to have
z = [(1, 3), (2, 4), (3, 5), (4, 7)]
How can I achieve this. I know you can use outerJoin followed by map to achieve this, but is there a more direct way for this.
Upvotes: 0
Views: 155
Reputation: 37435
rdd.union(otherRDD)
gives you the union of the two rdds as expected in the question
x.union(y)
Upvotes: 6
Reputation: 118001
You can just use the +
operator. In the context of lists, this is a concatenate operation.
>>> x = [(1, 3), (2, 4)]
>>> y = [(3, 5), (4, 7)]
>>> z = x + y
>>> z
[(1, 3), (2, 4), (3, 5), (4, 7)]
Upvotes: 0