Reputation: 41
Is there a way i can get the name of the key in pyspark inside the reduceByKey() function so that i can get what key is common between the two values passed into the reduceByKey() function ?
For example:
inside reduceByKey(combineValues) where
def combineValues(a,b):
//can i get the key value common to both a and b here ??
return a+b;
Upvotes: 1
Views: 1021
Reputation: 67085
You can use the aggregate
function on RDD, however you lose the HashPartitioner benefit, so I would suggest storing the key in your values if it's important.
Upvotes: 1