Reputation: 720
I am new to scala and spark.Now I have a RDD and the data like this:
[
(key1, compactbuffer(item1, item2, item3)),
(key2, compactbuffer(item3, item4))
.....
]
the another RDD is:
[item1, item2, item3, item4, item5, item6]
// it's ordered.
Then I want to get the Result like this:
[
(key1, compactbuffer(item4, item5, item6),
(key2, compactbuffer(item1, item2, item5, item6)
]
how do I achieve it?
Upvotes: 0
Views: 574
Reputation: 13927
Assuming the two RDDs
were named grouped
and master
, this should do it:
grouped.cartesian(master).filter(t => {
var found = false;
t._1._2.foreach(r => {if (r._2 == t._2) found = true});
!found
}).map(t => (t._1._1, t._2)).groupBy(x => x._1)
Upvotes: 1