How to store the duplicates of an spark rdd into another rdd

Question

I have two spark rdd : RDD1:RDD[(String,String,Int)] and RDD2:RDD[(String,String,Int)]

RDD1 is original data and RDD2 is distinct of RDD1

i need to create an RDD3 which is the RDD1-RDD2

for example :

RDD1:  [("one","one",23)],[("one","one",23)],[("two","two",28)],[("one","one",23)]
RDD2:  [("one","one",23)],[("two","two",28)]

expected

RDD3:[("one","one",23)],[("one","one",23)]

only the duplicates where count of the duplicates is reduced by 1

RD# is collection of only the duplicates for example if 10 transactions are there 1 is unique so i should collect the 9 transactions in the RDD3

Answers (1)