aaa
aaa

Reputation: 89

combine two rdd in pyspaek

I have two rdd in pyspark

rdd1=sc.parallelize(['a','b'])
rdd2=sc.parallelize(['c','d'])

I want to generate a rdd with pairs which conclude one element of each rdd. [(a,c),(b,c),(a,d),(b,d)] I tried

rdd3=rdd1.map(lambda x:x)+rdd2.map(lambda y:y)

it failed

Upvotes: 0

Views: 41

Answers (1)

Stanislas Morbieu
Stanislas Morbieu

Reputation: 1827

You are looking for cartesian product:

rdd1.cartesian(rdd2)

Upvotes: 1

Related Questions