user3849475
user3849475

Reputation: 339

Spark two RDD join issue

I have two RDDs.

moviesRDD =[(1,'monster'),(2,'minions 3D'),...] #(movieID,title)
ratingsRDD =[(1,(3,4)),(2,(4,5)),.....]  #(movieID,(numbersofrating,avg_rating))

The ideal results is:

newRDD =[(3,'monster',4),(4,'minions 3D',5),....] #(numbersofrating,title,avg_rating)

I am not sure how to get newRDDs.

Upvotes: 0

Views: 95

Answers (1)

zero323
zero323

Reputation: 330453

This should do the trick:

(moviesRDD
    .join(ratingsRDD) # Join by key
    .values() # Extract values
    .map(lambda x: (x[1][0], x[0], x[1][1]))) # Reshape

Upvotes: 1

Related Questions