Reputation: 155
I am new to scala. I have two RDD of below type :
RDD[(Long, List[Long])]
I want to subtract value inside List[Long] from two RDD.
For Example:
rddPair1 contains :
((4,List(5)), (1,List(2)), (2,List(4, 3, 4)), (3,List(6, 4)))
rddPair2 contains :
((5,List(6)), (2,List(3)), (3,List(4)))
I want resultant RDD Something like below :
(4,List(5)), (1,List(2)), (2,List(4, 4)), (3,List(6))
You can check here 2 , 3 keys matches and for this keys List value of rddPair2 gets subtracted from value of rddPair1.
Thanks In Advance
Upvotes: 2
Views: 972
Reputation: 37852
You can use leftOuterJoin
and then map the results to get the desired format:
val result: RDD[(Int, List[Int])] = rddPair1.leftOuterJoin(rddPair2).mapValues {
case (l1, Some(l2)) => l1.diff(l2) // match found - remove l2 from l1
case (l1, None) => l1 // no match - keep l1 as is
}
Upvotes: 3