Reputation: 89
I have two rdd in pyspark
rdd1=sc.parallelize(['a','b'])
rdd2=sc.parallelize(['c','d'])
I want to generate a rdd with pairs which conclude one element of each rdd. [(a,c),(b,c),(a,d),(b,d)] I tried
rdd3=rdd1.map(lambda x:x)+rdd2.map(lambda y:y)
it failed
Upvotes: 0
Views: 41
Reputation: 1827
You are looking for cartesian product:
rdd1.cartesian(rdd2)
Upvotes: 1