pyspark how to plus between two RDDs with same key match

Question

suppose I have 2 RDDs

where RDD1 has (key1,key2,value)

and RDD2 has (key1, value)

Now I want to combine operation ( like + or minus ) from RDD2 to RDD1 where key1 has a match here are example

RDD1 has [1,1,3],[1,2,2],[2,2,5]

RDD2 = sc.parallelize([1,1])

I want result

RDD3 to [1,1,4],[1,2,3],[2,2,5]  only the first and second data was added while third one wasn't

I try to use left outer join to find match on key1 and do some operation but I will lost the data that don't need to do operation is there a way to do operation in partial data?

pyspark how to plus between two RDDs with same key match

Answers (1)

Related Questions