Darshan Manek
Darshan Manek

Reputation: 155

Subtract Two RDD contains List As Value in Spark/Scala

I am new to scala. I have two RDD of below type :

RDD[(Long, List[Long])]

I want to subtract value inside List[Long] from two RDD.

For Example:

rddPair1 contains :

((4,List(5)), (1,List(2)), (2,List(4, 3, 4)), (3,List(6, 4)))

rddPair2 contains :

((5,List(6)), (2,List(3)), (3,List(4)))

I want resultant RDD Something like below :

(4,List(5)), (1,List(2)), (2,List(4, 4)), (3,List(6))

You can check here 2 , 3 keys matches and for this keys List value of rddPair2 gets subtracted from value of rddPair1.

Thanks In Advance

Upvotes: 2

Views: 972

Answers (1)

Tzach Zohar
Tzach Zohar

Reputation: 37852

You can use leftOuterJoin and then map the results to get the desired format:

val result: RDD[(Int, List[Int])] = rddPair1.leftOuterJoin(rddPair2).mapValues {
  case (l1, Some(l2)) => l1.diff(l2) // match found - remove l2 from l1
  case (l1, None) => l1              // no match  - keep l1 as is
}

Upvotes: 3

Related Questions